LGFeb 17Code
GLM-5: from Vibe Coding to Agentic EngineeringGLM-5 Team, Aohan Zeng, Xin Lv et al. · tsinghua
We present GLM-5, a next-generation foundation model designed to transition the paradigm of vibe coding to agentic engineering. Building upon the agentic, reasoning, and coding (ARC) capabilities of its predecessor, GLM-5 adopts DSA to significantly reduce training and inference costs while maintaining long-context fidelity. To advance model alignment and autonomy, we implement a new asynchronous reinforcement learning infrastructure that drastically improves post-training efficiency by decoupling generation from training. Furthermore, we propose novel asynchronous agent RL algorithms that further improve RL quality, enabling the model to learn from complex, long-horizon interactions more effectively. Through these innovations, GLM-5 achieves state-of-the-art performance on major open benchmarks. Most critically, GLM-5 demonstrates unprecedented capability in real-world coding tasks, surpassing previous baselines in handling end-to-end software engineering challenges. Code, models, and more information are available at https://github.com/zai-org/GLM-5.
LGSep 22, 2023Code
Understanding Deep Gradient Leakage via Inversion Influence FunctionsHaobo Zhang, Junyuan Hong, Yuyang Deng et al.
Deep Gradient Leakage (DGL) is a highly effective attack that recovers private training images from gradient vectors. This attack casts significant privacy challenges on distributed learning from clients with sensitive data, where clients are required to share gradients. Defending against such attacks requires but lacks an understanding of when and how privacy leakage happens, mostly because of the black-box nature of deep networks. In this paper, we propose a novel Inversion Influence Function (I$^2$F) that establishes a closed-form connection between the recovered images and the private gradients by implicitly solving the DGL problem. Compared to directly solving DGL, I$^2$F is scalable for analyzing deep networks, requiring only oracle access to gradients and Jacobian-vector products. We empirically demonstrate that I$^2$F effectively approximated the DGL generally on different model architectures, datasets, modalities, attack implementations, and perturbation-based defenses. With this novel tool, we provide insights into effective gradient perturbation directions, the unfairness of privacy protection, and privacy-preferred model initialization. Our codes are provided in https://github.com/illidanlab/inversion-influence-function.
LGFeb 7, 2023Code
A Privacy-Preserving Hybrid Federated Learning Framework for Financial Crime DetectionHaobo Zhang, Junyuan Hong, Fan Dong et al.
The recent decade witnessed a surge of increase in financial crimes across the public and private sectors, with an average cost of scams of $102m to financial institutions in 2022. Developing a mechanism for battling financial crimes is an impending task that requires in-depth collaboration from multiple institutions, and yet such collaboration imposed significant technical challenges due to the privacy and security requirements of distributed financial data. For example, consider the modern payment network systems, which can generate millions of transactions per day across a large number of global institutions. Training a detection model of fraudulent transactions requires not only secured transactions but also the private account activities of those involved in each transaction from corresponding bank systems. The distributed nature of both samples and features prevents most existing learning systems from being directly adopted to handle the data mining task. In this paper, we collectively address these challenges by proposing a hybrid federated learning system that offers secure and privacy-aware learning and inference for financial crime detection. We conduct extensive empirical studies to evaluate the proposed framework's detection performance and privacy-protection capability, evaluating its robustness against common malicious attacks of collaborative learning. We release our source code at https://github.com/illidanlab/HyFL .
84.5LGMay 21
Molecular Lead Optimization via Agentic Tool PlanningLingxiao Li, Haobo Zhang, Ruohao Fan et al.
Drug discovery is a lengthy and resource-intensive process composed of multiple stages. Among these stages, lead optimization plays a critical role in transforming early hit compounds into viable drug candidates. This stage requires improving ADMET-related properties through subtle structural refinement while preserving key molecular substructures responsible for binding affinity to disease targets. Recent advances in artificial intelligence have shown promise in accelerating various aspects of drug discovery; however, most existing approaches to lead optimization rely on one-step molecular optimization, which fail to account for the long-term consequences of sequential design decisions. To address this limitation, we propose TRACE, a trajectory-aware, LLM-reasoning agent for molecular lead optimization that formulates tool selection as a sequential decision-making problem over action trajectories. Given a lead molecule and an optimization objective, TRACE makes trajectory-aware decisions over molecular optimization tools, enabling forward-looking refinement under structural constraints. Experiments on multiple ADMET optimization tasks show that our agent achieves higher optimization success, larger property improvements, and higher validity, while preserving molecular similarity compared to baseline models.
STMar 27, 2023
On the Optimality of Misspecified Spectral AlgorithmsHaobo Zhang, Yicheng Li, Qian Lin
In the misspecified spectral algorithms problem, researchers usually assume the underground true function $f_ρ^{*} \in [\mathcal{H}]^{s}$, a less-smooth interpolation space of a reproducing kernel Hilbert space (RKHS) $\mathcal{H}$ for some $s\in (0,1)$. The existing minimax optimal results require $\|f_ρ^{*}\|_{L^{\infty}}<\infty$ which implicitly requires $s > α_{0}$ where $α_{0}\in (0,1)$ is the embedding index, a constant depending on $\mathcal{H}$. Whether the spectral algorithms are optimal for all $s\in (0,1)$ is an outstanding problem lasting for years. In this paper, we show that spectral algorithms are minimax optimal for any $α_{0}-\frac{1}β < s < 1$, where $β$ is the eigenvalue decay rate of $\mathcal{H}$. We also give several classes of RKHSs whose embedding index satisfies $ α_0 = \frac{1}β $. Thus, the spectral algorithms are minimax optimal for all $s\in (0,1)$ on these RKHSs.
LGSep 23, 2023
On the Asymptotic Learning Curves of Kernel Ridge Regression under Power-law DecayYicheng Li, Haobo Zhang, Qian Lin
The widely observed 'benign overfitting phenomenon' in the neural network literature raises the challenge to the 'bias-variance trade-off' doctrine in the statistical learning theory. Since the generalization ability of the 'lazy trained' over-parametrized neural network can be well approximated by that of the neural tangent kernel regression, the curve of the excess risk (namely, the learning curve) of kernel ridge regression attracts increasing attention recently. However, most recent arguments on the learning curve are heuristic and are based on the 'Gaussian design' assumption. In this paper, under mild and more realistic assumptions, we rigorously provide a full characterization of the learning curve: elaborating the effect and the interplay of the choice of the regularization parameter, the source condition and the noise. In particular, our results suggest that the 'benign overfitting phenomenon' exists in very wide neural networks only when the noise level is small.
MLSep 8, 2023
Optimal Rate of Kernel Regression in Large DimensionsWeihao Lu, Haobo Zhang, Yicheng Li et al.
We perform a study on kernel regression for large-dimensional data (where the sample size $n$ is polynomially depending on the dimension $d$ of the samples, i.e., $n\asymp d^γ$ for some $γ>0$ ). We first build a general tool to characterize the upper bound and the minimax lower bound of kernel regression for large dimensional data through the Mendelson complexity $\varepsilon_{n}^{2}$ and the metric entropy $\bar{\varepsilon}_{n}^{2}$ respectively. When the target function falls into the RKHS associated with a (general) inner product model defined on $\mathbb{S}^{d}$, we utilize the new tool to show that the minimax rate of the excess risk of kernel regression is $n^{-1/2}$ when $n\asymp d^γ$ for $γ=2, 4, 6, 8, \cdots$. We then further determine the optimal rate of the excess risk of kernel regression for all the $γ>0$ and find that the curve of optimal rate varying along $γ$ exhibits several new phenomena including the multiple descent behavior and the periodic plateau behavior. As an application, For the neural tangent kernel (NTK), we also provide a similar explicit description of the curve of optimal rate. As a direct corollary, we know these claims hold for wide neural networks as well.
97.2CRApr 7Code
AttnDiff: Attention-based Differential Fingerprinting for Large Language ModelsHaobo Zhang, Zhenhua Xu, Junxian Li et al.
Protecting the intellectual property of open-weight large language models (LLMs) requires verifying whether a suspect model is derived from a victim model despite common laundering operations such as fine-tuning (including PPO/DPO), pruning/compression, and model merging. We propose \textsc{AttnDiff}, a data-efficient white-box framework that extracts fingerprints from models via intrinsic information-routing behavior. \textsc{AttnDiff} probes minimally edited prompt pairs that induce controlled semantic conflicts, captures differential attention patterns, summarizes them with compact spectral descriptors, and compares models using CKA. Across Llama-2/3 and Qwen2.5 (3B--14B) and additional open-source families, it yields high similarity for related derivatives while separating unrelated model families (e.g., $>0.98$ vs.\ $<0.22$ with $M=60$ probes). With 5--60 multi-domain probes, it supports practical provenance verification and accountability.
CRJan 13Code
ForgetMark: Stealthy Fingerprint Embedding via Targeted Unlearning in Language ModelsZhenhua Xu, Haobo Zhang, Zhebo Wang et al.
Existing invasive (backdoor) fingerprints suffer from high-perplexity triggers that are easily filtered, fixed response patterns exposed by heuristic detectors, and spurious activations on benign inputs. We introduce \textsc{ForgetMark}, a stealthy fingerprinting framework that encodes provenance via targeted unlearning. It builds a compact, human-readable key--value set with an assistant model and predictive-entropy ranking, then trains lightweight LoRA adapters to suppress the original values on their keys while preserving general capabilities. Ownership is verified under black/gray-box access by aggregating likelihood and semantic evidence into a fingerprint success rate. By relying on probabilistic forgetting traces rather than fixed trigger--response patterns, \textsc{ForgetMark} avoids high-perplexity triggers, reduces detectability, and lowers false triggers. Across diverse architectures and settings, it achieves 100\% ownership verification on fingerprinted models while maintaining standard performance, surpasses backdoor baselines in stealthiness and robustness to model merging, and remains effective under moderate incremental fine-tuning. Our code and data are available at \href{https://github.com/Xuzhenhua55/ForgetMark}{https://github.com/Xuzhenhua55/ForgetMark}.
83.2CRMay 24
MemMark: State-Evolution Attribution Watermarking for Agent Long-Term Memory SystemsHaobo Zhang, Xutao Mao, Guangyuan Dong et al.
Memory-backed agents need provenance that can survive leaked or migrated snapshots, where logs, visible outputs, and trusted metadata may be absent. We propose MemMark, a state-evolution attribution watermark that embeds an owner-controlled signal into latent memory-write decisions. At each internal LLM call, MemMark samples among admissible candidates using keyed, distribution-preserving selection, and records cryptographic commitments with signed session anchors and reveal evidence. This makes attribution depend on reproducible backend behavior rather than mutable provenance fields. Across A-Mem and Graphiti on LoCoMo, with three LLM backbones, MemMark preserves memory utility: Overall F1 retains 99.6% of the unwatermarked baseline, while BLEU-1 changes by +0.2%. It also provides usable carrier capacity, with 1.16, 1.14, and 1.26 bits of mean entropy for update-target, link-target, and semantic-realization decisions. In the snapshot-only R3 setting, MemMark recovers the full 40-bit payload from final snapshots, while wrong-key verification remains near chance. Under nine memory-lifecycle attacks, verification distinguishes tampering, evidence deletion, and partial payload recovery. These results show that robust snapshot-only attribution is feasible for long-term agent memory without surviving traces, trusted metadata, or utility-degrading.
CVSep 4, 2023
Safe and Robust Watermark Injection with a Single OoD ImageShuyang Yu, Junyuan Hong, Haobo Zhang et al.
Training a high-performance deep neural network requires large amounts of data and computational resources. Protecting the intellectual property (IP) and commercial ownership of a deep model is challenging yet increasingly crucial. A major stream of watermarking strategies implants verifiable backdoor triggers by poisoning training samples, but these are often unrealistic due to data privacy and safety concerns and are vulnerable to minor model changes such as fine-tuning. To overcome these challenges, we propose a safe and robust backdoor-based watermark injection technique that leverages the diverse knowledge from a single out-of-distribution (OoD) image, which serves as a secret key for IP verification. The independence of training data makes it agnostic to third-party promises of IP security. We induce robustness via random perturbation of model parameters during watermark injection to defend against common watermark removal attacks, including fine-tuning, pruning, and model extraction. Our experimental results demonstrate that the proposed watermarking approach is not only time- and sample-efficient without training data, but also robust against the watermark removal attacks above.
79.5CRApr 7
Copyright Protection for Large Language Models: A Survey of Methods, Challenges, and TrendsZhenhua Xu, Xubin Yue, Zhebo Wang et al.
Copyright protection for large language models is of critical importance, given their substantial development costs, proprietary value, and potential for misuse. Existing surveys have predominantly focused on techniques for tracing LLM-generated content-namely, text watermarking-while a systematic exploration of methods for protecting the models themselves (i.e., model watermarking and model fingerprinting) remains absent. Moreover, the relationships and distinctions among text watermarking, model watermarking, and model fingerprinting have not been comprehensively clarified. This work presents a comprehensive survey of the current state of LLM copyright protection technologies, with a focus on model fingerprinting, covering the following aspects: (1) clarifying the conceptual connection from text watermarking to model watermarking and fingerprinting, and adopting a unified terminology that incorporates model watermarking into the broader fingerprinting framework; (2) providing an overview and comparison of diverse text watermarking techniques, highlighting cases where such methods can function as model fingerprinting; (3) systematically categorizing and comparing existing model fingerprinting approaches for LLM copyright protection; (4) presenting, for the first time, techniques for fingerprint transfer and fingerprint removal; (5) summarizing evaluation metrics for model fingerprints, including effectiveness, harmlessness, robustness, stealthiness, and reliability; and (6) discussing open challenges and future research directions. This survey aims to offer researchers a thorough understanding of both text watermarking and model fingerprinting technologies in the era of LLMs, thereby fostering further advances in protecting their intellectual property.
97.9IRMar 10
RecThinker: An Agentic Framework for Tool-Augmented Reasoning in RecommendationHaobo Zhang, Yutao Zhu, Kelong Mao et al.
Large Language Models (LLMs) have revolutionized recommendation agents by providing superior reasoning and flexible decision-making capabilities. However, existing methods mainly follow a passive information acquisition paradigm, where agents either rely on static pre-defined workflows or perform reasoning with constrained information. It limits the agent's ability to identify information sufficiency, often leading to suboptimal recommendations when faced with fragmented user profiles or sparse item metadata. To address these limitations, we propose RecThinker, an agentic framework for tool-augmented reasoning in recommendation, which shifts recommendation from passive processing to autonomous investigation by dynamically planning reasoning paths and proactively acquiring essential information via autonomous tool-use. Specifically, RecThinker adopts an Analyze-Plan-Act paradigm, which first analyzes the sufficiency of user-item information and autonomously invokes tool-calling sequences to bridge information gaps between available knowledge and reasoning requirements. We develop a suite of specialized tools for RecThinker, enabling the model to acquire user-side, item-side, and collaborative information for better reasoning and user-item matching. Furthermore, we introduce a self-augmented training pipeline, comprising a Supervised Fine-Tuning (SFT) stage to internalize high-quality reasoning trajectories and a Reinforcement Learning (RL) stage to optimize for decision accuracy and tool-use efficiency. Extensive experiments on multiple benchmark datasets demonstrate that RecThinker consistently outperforms strong baselines in the recommendation scenario.
LGMar 28, 2023
Kernel interpolation generalizes poorlyYicheng Li, Haobo Zhang, Qian Lin
One of the most interesting problems in the recent renaissance of the studies in kernel regression might be whether the kernel interpolation can generalize well, since it may help us understand the `benign overfitting henomenon' reported in the literature on deep networks. In this paper, under mild conditions, we show that for any $\varepsilon>0$, the generalization error of kernel interpolation is lower bounded by $Ω(n^{-\varepsilon})$. In other words, the kernel interpolation generalizes poorly for a large class of kernels. As a direct corollary, we can show that overfitted wide neural networks defined on the sphere generalize poorly.
LGNov 14, 2025
Chain-of-Generation: Progressive Latent Diffusion for Text-Guided Molecular DesignLingxiao Li, Haobo Zhang, Bin Chen et al.
Text-conditioned molecular generation aims to translate natural-language descriptions into chemical structures, enabling scientists to specify functional groups, scaffolds, and physicochemical constraints without handcrafted rules. Diffusion-based models, particularly latent diffusion models (LDMs), have recently shown promise by performing stochastic search in a continuous latent space that compactly captures molecular semantics. Yet existing methods rely on one-shot conditioning, where the entire prompt is encoded once and applied throughout diffusion, making it hard to satisfy all the requirements in the prompt. We discuss three outstanding challenges of one-shot conditioning generation, including the poor interpretability of the generated components, the failure to generate all substructures, and the overambition in considering all requirements simultaneously. We then propose three principles to address those challenges, motivated by which we propose Chain-of-Generation (CoG), a training-free multi-stage latent diffusion framework. CoG decomposes each prompt into curriculum-ordered semantic segments and progressively incorporates them as intermediate goals, guiding the denoising trajectory toward molecules that satisfy increasingly rich linguistic constraints. To reinforce semantic guidance, we further introduce a post-alignment learning phase that strengthens the correspondence between textual and molecular latent spaces. Extensive experiments on benchmark and real-world tasks demonstrate that CoG yields higher semantic alignment, diversity, and controllability than one-shot baselines, producing molecules that more faithfully reflect complex, compositional prompts while offering transparent insight into the generation process.
AIOct 30, 2025
The FM AgentAnnan Li, Chufan Wu, Zengle Ge et al.
Large language models (LLMs) are catalyzing the development of autonomous AI research agents for scientific and engineering discovery. We present FM Agent, a novel and general-purpose multi-agent framework that leverages a synergistic combination of LLM-based reasoning and large-scale evolutionary search to address complex real-world challenges. The core of FM Agent integrates several key innovations: 1) a cold-start initialization phase incorporating expert guidance, 2) a novel evolutionary sampling strategy for iterative optimization, 3) domain-specific evaluators that combine correctness, effectiveness, and LLM-supervised feedback, and 4) a distributed, asynchronous execution infrastructure built on Ray. Demonstrating broad applicability, our system has been evaluated across diverse domains, including operations research, machine learning, GPU kernel optimization, and classical mathematical problems. FM Agent reaches state-of-the-art results autonomously, without human interpretation or tuning -- 1976.3 on ALE-Bench (+5.2\%), 43.56\% on MLE-Bench (+4.0pp), up to 20x speedups on KernelBench, and establishes new state-of-the-art(SOTA) results on several classical mathematical problems. Beyond academic benchmarks, FM Agent shows considerable promise for both large-scale enterprise R\&D workflows and fundamental scientific research, where it can accelerate innovation, automate complex discovery processes, and deliver substantial engineering and scientific advances with broader societal impact.
MLMay 15, 2024
On the Saturation Effect of Kernel Ridge RegressionYicheng Li, Haobo Zhang, Qian Lin
The saturation effect refers to the phenomenon that the kernel ridge regression (KRR) fails to achieve the information theoretical lower bound when the smoothness of the underground truth function exceeds certain level. The saturation effect has been widely observed in practices and a saturation lower bound of KRR has been conjectured for decades. In this paper, we provide a proof of this long-standing conjecture.
LGApr 19, 2024
The phase diagram of kernel interpolation in large dimensionsHaobo Zhang, Weihao Lu, Qian Lin
The generalization ability of kernel interpolation in large dimensions (i.e., $n \asymp d^γ$ for some $γ>0$) might be one of the most interesting problems in the recent renaissance of kernel regression, since it may help us understand the 'benign overfitting phenomenon' reported in the neural networks literature. Focusing on the inner product kernel on the sphere, we fully characterized the exact order of both the variance and bias of large-dimensional kernel interpolation under various source conditions $s\geq 0$. Consequently, we obtained the $(s,γ)$-phase diagram of large-dimensional kernel interpolation, i.e., we determined the regions in $(s,γ)$-plane where the kernel interpolation is minimax optimal, sub-optimal and inconsistent.
LGJan 2, 2024
Optimal Rates of Kernel Ridge Regression under Source Condition in Large DimensionsHaobo Zhang, Yicheng Li, Weihao Lu et al.
Motivated by the studies of neural networks (e.g.,the neural tangent kernel theory), we perform a study on the large-dimensional behavior of kernel ridge regression (KRR) where the sample size $n \asymp d^γ$ for some $γ> 0$. Given an RKHS $\mathcal{H}$ associated with an inner product kernel defined on the sphere $\mathbb{S}^{d}$, we suppose that the true function $f_ρ^{*} \in [\mathcal{H}]^{s}$, the interpolation space of $\mathcal{H}$ with source condition $s>0$. We first determined the exact order (both upper and lower bound) of the generalization error of kernel ridge regression for the optimally chosen regularization parameter $λ$. We then further showed that when $0<s\le1$, KRR is minimax optimal; and when $s>1$, KRR is not minimax optimal (a.k.a. he saturation effect). Our results illustrate that the curves of rate varying along $γ$ exhibit the periodic plateau behavior and the multiple descent behavior and show how the curves evolve with $s>0$. Interestingly, our work provides a unified viewpoint of several recent works on kernel regression in the large-dimensional setting, which correspond to $s=0$ and $s=1$ respectively.
MLMar 1, 2025
On the Saturation Effects of Spectral Algorithms in Large DimensionsWeihao Lu, Haobo Zhang, Yicheng Li et al.
The saturation effects, which originally refer to the fact that kernel ridge regression (KRR) fails to achieve the information-theoretical lower bound when the regression function is over-smooth, have been observed for almost 20 years and were rigorously proved recently for kernel ridge regression and some other spectral algorithms over a fixed dimensional domain. The main focus of this paper is to explore the saturation effects for a large class of spectral algorithms (including the KRR, gradient descent, etc.) in large dimensional settings where $n \asymp d^γ$. More precisely, we first propose an improved minimax lower bound for the kernel regression problem in large dimensional settings and show that the gradient flow with early stopping strategy will result in an estimator achieving this lower bound (up to a logarithmic factor). Similar to the results in KRR, we can further determine the exact convergence rates (both upper and lower bounds) of a large class of (optimal tuned) spectral algorithms with different qualification $τ$'s. In particular, we find that these exact rate curves (varying along $γ$) exhibit the periodic plateau behavior and the polynomial approximation barrier. Consequently, we can fully depict the saturation effects of the spectral algorithms and reveal a new phenomenon in large dimensional settings (i.e., the saturation effect occurs in large dimensional setting as long as the source condition $s>τ$ while it occurs in fixed dimensional setting as long as $s>2τ$).
CLMay 28, 2025
Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model MergingHaobo Zhang, Jiayu Zhou
Fine-tuning large language models (LMs) for individual tasks yields strong performance but is expensive for deployment and storage. Recent works explore model merging to combine multiple task-specific models into a single multi-task model without additional training. However, existing merging methods often fail for models fine-tuned with low-rank adaptation (LoRA), due to significant performance degradation. In this paper, we show that this issue arises from a previously overlooked interplay between model parameters and data distributions. We propose Orthogonal Subspaces for Robust model Merging (OSRM) to constrain the LoRA subspace *prior* to fine-tuning, ensuring that updates relevant to one task do not adversely shift outputs for others. Our approach can seamlessly integrate with most existing merging algorithms, reducing the unintended interference among tasks. Extensive experiments on eight datasets, tested with three widely used LMs and two large LMs, demonstrate that our method not only boosts merging performance but also preserves single-task accuracy. Furthermore, our approach exhibits greater robustness to the hyperparameters of merging. These results highlight the importance of data-parameter interaction in model merging and offer a plug-and-play solution for merging LoRA models.
LGDec 25, 2024
Towards a Statistical Understanding of Neural Networks: Beyond the Neural Tangent Kernel TheoriesHaobo Zhang, Jianfa Lai, Yicheng Li et al.
A primary advantage of neural networks lies in their feature learning characteristics, which is challenging to theoretically analyze due to the complexity of their training dynamics. We propose a new paradigm for studying feature learning and the resulting benefits in generalizability. After reviewing the neural tangent kernel (NTK) theory and recent results in kernel regression, which address the generalization issue of sufficiently wide neural networks, we examine limitations and implications of the fixed kernel theory (as the NTK theory) and review recent theoretical advancements in feature learning. Moving beyond the fixed kernel/feature theory, we consider neural networks as adaptive feature models. Finally, we propose an over-parameterized Gaussian sequence model as a prototype model to study the feature learning characteristics of neural networks.
LGMay 12, 2023
On the Optimality of Misspecified Kernel Ridge RegressionHaobo Zhang, Yicheng Li, Weihao Lu et al.
In the misspecified kernel ridge regression problem, researchers usually assume the underground true function $f_ρ^{*} \in [\mathcal{H}]^{s}$, a less-smooth interpolation space of a reproducing kernel Hilbert space (RKHS) $\mathcal{H}$ for some $s\in (0,1)$. The existing minimax optimal results require $\|f_ρ^{*}\|_{L^{\infty}}<\infty$ which implicitly requires $s > α_{0}$ where $α_{0}\in (0,1)$ is the embedding index, a constant depending on $\mathcal{H}$. Whether the KRR is optimal for all $s\in (0,1)$ is an outstanding problem lasting for years. In this paper, we show that KRR is minimax optimal for any $s\in (0,1)$ when the $\mathcal{H}$ is a Sobolev RKHS.
DCDec 17, 2021
From Deterioration to Acceleration: A Calibration Approach to Rehabilitating Step Asynchronism in Federated OptimizationFeijie Wu, Song Guo, Haozhao Wang et al.
In the setting of federated optimization, where a global model is aggregated periodically, step asynchronism occurs when participants conduct model training by efficiently utilizing their computational resources. It is well acknowledged that step asynchronism leads to objective inconsistency under non-i.i.d. data, which degrades the model's accuracy. To address this issue, we propose a new algorithm FedaGrac, which calibrates the local direction to a predictive global orientation. Taking advantage of the estimated orientation, we guarantee that the aggregated model does not excessively deviate from the global optimum while fully utilizing the local updates of faster nodes. We theoretically prove that FedaGrac holds an improved order of convergence rate than the state-of-the-art approaches and eliminates the negative effect of step asynchronism. Empirical results show that our algorithm accelerates the training and enhances the final accuracy.
IVSep 25, 2021
Joint Progressive and Coarse-to-fine Registration of Brain MRI via Deformation Field Integration and Non-Rigid Feature FusionJinxin Lv, Zhiwei Wang, Hongkuan Shi et al.
Registration of brain MRI images requires to solve a deformation field, which is extremely difficult in aligning intricate brain tissues, e.g., subcortical nuclei, etc. Existing efforts resort to decomposing the target deformation field into intermediate sub-fields with either tiny motions, i.e., progressive registration stage by stage, or lower resolutions, i.e., coarse-to-fine estimation of the full-size deformation field. In this paper, we argue that those efforts are not mutually exclusive, and propose a unified framework for robust brain MRI registration in both progressive and coarse-to-fine manners simultaneously. Specifically, building on a dual-encoder U-Net, the fixed-moving MRI pair is encoded and decoded into multi-scale deformation sub-fields from coarse to fine. Each decoding block contains two proposed novel modules: i) in Deformation Field Integration (DFI), a single integrated sub-field is calculated, warping by which is equivalent to warping progressively by sub-fields from all previous decoding blocks, and ii) in Non-rigid Feature Fusion (NFF), features of the fixed-moving pair are aligned by DFI-integrated sub-field, and then fused to predict a finer sub-field. Leveraging both DFI and NFF, the target deformation field is factorized into multi-scale sub-fields, where the coarser fields alleviate the estimate of a finer one and the finer field learns to make up those misalignments insolvable by previous coarser ones. The extensive and comprehensive experimental results on both private and public datasets demonstrate a superior registration performance of brain MRI images over progressive registration only and coarse-to-fine estimation only, with an increase by at most 8% in the average Dice.
ASOct 22, 2020
The NTU-AISG Text-to-speech System for Blizzard Challenge 2020Haobo Zhang, Tingzhi Mao, Haihua Xu et al.
We report our NTU-AISG Text-to-speech (TTS) entry systems for the Blizzard Challenge 2020 in this paper. There are two TTS tasks in this year's challenge, one is a Mandarin TTS task, the other is a Shanghai dialect TTS task. We have participated both. One of the main challenges is to build TTS systems with low-resource constraints, particularly for the case of Shanghai dialect, of which about three hours data are available to participants. To overcome the constraint, we adopt an average-speaker modeling method. That is, we first employ external Mandarin data to train both End-to-end acoustic model and WaveNet vocoder, then we use Shanghai dialect to tune the acoustic model and WaveNet vocoder respectively. Apart from this, we have no Shanghai dialect lexicon despite syllable transcripts are provided for the training data. Since we are not sure if similar syllable transcripts are provided for the evaluation data during the training stage, we use Mandarin lexicon for Shanghai dialect instead. With the letter, as decomposed from the corresponding Mandarin syllable, as input, though the naturalness and original speaker similarity of the synthesized speech are good, subjective evaluation results indicate the intelligibility of the synthesized speech is deeply undermined for the Shanghai dialect TTS system.
ASOct 22, 2020
Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM FrameworkYizhou Peng, Jicheng Zhang, Haobo Zhang et al.
Human can recognize speech, as well as the peculiar accent of the speech simultaneously. However, present state-of-the-art ASR system can rarely do that. In this paper, we propose a multilingual approach to recognizing English speech, and related accent that speaker conveys using DNN-HMM framework. Specifically, we assume different accents of English as different languages. We then merge them together and train a multilingual ASR system. During decoding, we conduct two experiments. One is a monolingual ASR-based decoding, with the accent information embedded at phone level, realizing word-based accent recognition (AR), and the other is a multilingual ASR-based decoding, realizing an approximated utterance-based AR. Experimental results on an 8-accent English speech recognition show both methods can yield WERs close to the conventional ASR systems that completely ignore the accent, as well as desired AR accuracy. Besides, we conduct extensive analysis for the proposed method, such as transfer learning without-domain data exploitation, cross-accent recognition confusion, as well as characteristics of accented-word.