ASOct 31, 2022Code
Fast and parallel decoding for transducerWei Kang, Liyong Guo, Fangjun Kuang et al. · nvidia
The transducer architecture is becoming increasingly popular in the field of speech recognition, because it is naturally streaming as well as high in accuracy. One of the drawbacks of transducer is that it is difficult to decode in a fast and parallel way due to an unconstrained number of symbols that can be emitted per time step. In this work, we introduce a constrained version of transducer loss to learn strictly monotonic alignments between the sequences; we also improve the standard greedy search and beam search algorithms by limiting the number of symbols that can be emitted per time step in transducer decoding, making it more efficient to decode in parallel with batches. Furthermore, we propose an finite state automaton-based (FSA) parallel beam search algorithm that can run with graphs on GPU efficiently. The experiment results show that we achieve slight word error rate (WER) improvement as well as significant speedup in decoding. Our work is open-sourced and publicly available\footnote{https://github.com/k2-fsa/icefall}.
ASOct 31, 2022Code
Delay-penalized transducer for low-latency streaming ASRWei Kang, Zengwei Yao, Fangjun Kuang et al. · nvidia
In streaming automatic speech recognition (ASR), it is desirable to reduce latency as much as possible while having minimum impact on recognition accuracy. Although a few existing methods are able to achieve this goal, they are difficult to implement due to their dependency on external alignments. In this paper, we propose a simple way to penalize symbol delay in transducer model, so that we can balance the trade-off between symbol delay and accuracy for streaming models without external alignments. Specifically, our method adds a small constant times (T/2 - t), where T is the number of frames and t is the current frame, to all the non-blank log-probabilities (after normalization) that are fed into the two dimensional transducer recursion. For both streaming Conformer models and unidirectional long short-term memory (LSTM) models, experimental results show that it can significantly reduce the symbol delay with an acceptable performance degradation. Our method achieves similar delay-accuracy trade-off to the previously published FastEmit, but we believe our method is preferable because it has a better justification: it is equivalent to penalizing the average symbol delay. Our work is open-sourced and publicly available (https://github.com/k2-fsa/k2).
ASOct 31, 2022Code
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge DistillationLiyong Guo, Xiaoyu Yang, Quandong Wang et al. · nvidia
Knowledge distillation(KD) is a common approach to improve model performance in automatic speech recognition (ASR), where a student model is trained to imitate the output behaviour of a teacher model. However, traditional KD methods suffer from teacher label storage issue, especially when the training corpora are large. Although on-the-fly teacher label generation tackles this issue, the training speed is significantly slower as the teacher model has to be evaluated every batch. In this paper, we reformulate the generation of teacher label as a codec problem. We propose a novel Multi-codebook Vector Quantization (MVQ) approach that compresses teacher embeddings to codebook indexes (CI). Based on this, a KD training framework (MVQ-KD) is proposed where a student model predicts the CI generated from the embeddings of a self-supervised pre-trained teacher model. Experiments on the LibriSpeech clean-100 hour show that MVQ-KD framework achieves comparable performance as traditional KD methods (l1, l2), while requiring 256 times less storage. When the full LibriSpeech dataset is used, MVQ-KD framework results in 13.8% and 8.2% relative word error rate reductions (WERRs) for non -streaming transducer on test-clean and test-other and 4.0% and 4.9% for streaming transducer. The implementation of this work is already released as a part of the open-source project icefall.
ASOct 17, 2023Code
Zipformer: A faster and better encoder for automatic speech recognitionZengwei Yao, Liyong Guo, Xiaoyu Yang et al.
The Conformer has become the most popular encoder model for automatic speech recognition (ASR). It adds convolution modules to a transformer to learn both local and global dependencies. In this work we describe a faster, more memory-efficient, and better-performing transformer, called Zipformer. Modeling changes include: 1) a U-Net-like encoder structure where middle stacks operate at lower frame rates; 2) reorganized block structure with more modules, within which we re-use attention weights for efficiency; 3) a modified form of LayerNorm called BiasNorm allows us to retain some length information; 4) new activation functions SwooshR and SwooshL work better than Swish. We also propose a new optimizer, called ScaledAdam, which scales the update by each tensor's current scale to keep the relative change about the same, and also explictly learns the parameter scale. It achieves faster convergence and better performance than Adam. Extensive experiments on LibriSpeech, Aishell-1, and WenetSpeech datasets demonstrate the effectiveness of our proposed Zipformer over other state-of-the-art ASR models. Our code is publicly available at https://github.com/k2-fsa/icefall.
ASJun 23, 2022
Pruned RNN-T for fast, memory-efficient ASR trainingFangjun Kuang, Liyong Guo, Wei Kang et al.
The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with naturally streaming recognition. One of the drawbacks of RNN-T is that its loss function is relatively slow to compute, and can use a lot of memory. Excessive GPU memory usage can make it impractical to use RNN-T loss in cases where the vocabulary size is large: for example, for Chinese character-based ASR. We introduce a method for faster and more memory-efficient RNN-T loss computation. We first obtain pruning bounds for the RNN-T recursion using a simple joiner network that is linear in the encoder and decoder embeddings; we can evaluate this without using much memory. We then use those pruning bounds to evaluate the full, non-linear joiner network.
ASSep 14, 2023Code
PromptASR for contextualized ASR with controllable styleXiaoyu Yang, Wei Kang, Zengwei Yao et al.
Prompts are crucial to large language models as they provide context information such as topic or logical relationships. Inspired by this, we propose PromptASR, a framework that integrates prompts in end-to-end automatic speech recognition (E2E ASR) systems to achieve contextualized ASR with controllable style of transcriptions. Specifically, a dedicated text encoder encodes the text prompts and the encodings are injected into the speech encoder by cross-attending the features from two modalities. When using the ground truth text from preceding utterances as content prompt, the proposed system achieves 21.9% and 6.8% relative word error rate reductions on a book reading dataset and an in-house dataset compared to a baseline ASR system. The system can also take word-level biasing lists as prompt to improve recognition accuracy on rare words. An additional style prompt can be given to the text encoder and guide the ASR system to output different styles of transcriptions. The code is available at icefall.
CLApr 1Code
OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language ModelsHan Zhu, Lingxuan Ye, Wei Kang et al.
We present OmniVoice, a massive multilingual zero-shot text-to-speech (TTS) model that scales to over 600 languages. At its core is a novel diffusion language model-style discrete non-autoregressive (NAR) architecture. Unlike conventional discrete NAR models that suffer from performance bottlenecks in complex two-stage (text-to-semantic-to-acoustic) pipelines, OmniVoice directly maps text to multi-codebook acoustic tokens. This simplified approach is facilitated by two key technical innovations: (1) a full-codebook random masking strategy for efficient training, and (2) initialization from a pre-trained LLM to ensure superior intelligibility. By leveraging a 581k-hour multilingual dataset curated entirely from open-source data, OmniVoice achieves the broadest language coverage to date and delivers state-of-the-art performance across Chinese, English, and diverse multilingual benchmarks. Our code and pre-trained models are publicly available at https://github.com/k2-fsa/OmniVoice.
OCMay 1, 2022
Neural Network Optimal Feedback Control with Guaranteed Local StabilityTenavi Nakamura-Zimmerer, Qi Gong, Wei Kang
Recent research shows that supervised learning can be an effective tool for designing near-optimal feedback controllers for high-dimensional nonlinear dynamic systems. But the behavior of neural network controllers is still not well understood. In particular, some neural networks with high test accuracy can fail to even locally stabilize the dynamic system. To address this challenge we propose several novel neural network architectures, which we show guarantee local asymptotic stability while retaining the approximation capacity to learn the optimal feedback policy semi-globally. The proposed architectures are compared against standard neural network feedback controllers through numerical simulations of two high-dimensional nonlinear optimal control problems: stabilization of an unstable Burgers-type partial differential equation, and altitude and course tracking for an unmanned aerial vehicle. The simulations demonstrate that standard neural networks can fail to stabilize the dynamics even when trained well, while the proposed architectures are always at least locally stabilizing and can achieve near-optimal performance.
DSNov 24, 2011
The Consistency of Partial Observability for PDEsWei Kang
In this paper, a new definition of observability is introduced for PDEs. It is a quantitative measure of partial observability. The quantity is proved to be consistent if approximated using well posed approximation schemes. A first order approximation of an unobservability index using empirical gramian is introduced. For linear systems with full state observability, the empirical gramian is equivalent to the observability gramian in control theory. The consistency of the defined observability is exemplified using a Burgers' equation.
NAFeb 10, 2017
Solving 1D Conservation Laws Using Pontryagin's Minimum PrincipleWei Kang, Lucas C. Wilcox
This paper discusses a connection between scalar convex conservation laws and Pontryagin's minimum principle. For flux functions for which an associated optimal control problem can be found, a minimum value solution of the conservation law is proposed. For scalar space-independent convex conservation laws such a control problem exists and the minimum value solution of the conservation law is equivalent to the entropy solution. This can be seen as a generalization of the Lax--Oleinik formula to convex (not necessarily uniformly convex) flux functions. Using Pontryagin's minimum principle, an algorithm for finding the minimum value solution pointwise of scalar convex conservation laws is given. Numerical examples of approximating the solution of both space-dependent and space-independent conservation laws are provided to demonstrate the accuracy and applicability of the proposed algorithm. Furthermore, a MATLAB routine using Chebfun is provided (along with demonstration code on how to use it) to approximately solve scalar convex conservation laws with space-independent flux functions.
CRApr 13, 2022
Towards A Critical Evaluation of Robustness for Deep Learning Backdoor CountermeasuresHuming Qiu, Hua Ma, Zhi Zhang et al.
Since Deep Learning (DL) backdoor attacks have been revealed as one of the most insidious adversarial attacks, a number of countermeasures have been developed with certain assumptions defined in their respective threat models. However, the robustness of these countermeasures is inadvertently ignored, which can introduce severe consequences, e.g., a countermeasure can be misused and result in a false implication of backdoor detection. For the first time, we critically examine the robustness of existing backdoor countermeasures with an initial focus on three influential model-inspection ones that are Neural Cleanse (S&P'19), ABS (CCS'19), and MNTD (S&P'21). Although the three countermeasures claim that they work well under their respective threat models, they have inherent unexplored non-robust cases depending on factors such as given tasks, model architectures, datasets, and defense hyper-parameter, which are \textit{not even rooted from delicate adaptive attacks}. We demonstrate how to trivially bypass them aligned with their respective threat models by simply varying aforementioned factors. Particularly, for each defense, formal proofs or empirical studies are used to reveal its two non-robust cases where it is not as robust as it claims or expects, especially the recent MNTD. This work highlights the necessity of thoroughly evaluating the robustness of backdoor countermeasures to avoid their misleading security implications in unknown non-robust cases.
SYMar 9, 2022
Machine Learning based Optimal Feedback Control for Microgrid StabilizationTianwei Xia, Kai Sun, Wei Kang
Microgrids have more operational flexibilities as well as uncertainties than conventional power grids, especially when renewable energy resources are utilized. An energy storage based feedback controller can compensate undesired dynamics of a microgrid to improve its stability. However, the optimal feedback control of a microgrid subject to a large disturbance needs to solve a Hamilton-Jacobi-Bellman problem. This paper proposes a machine learning-based optimal feedback control scheme. Its training dataset is generated from a linear-quadratic regulator and a brute-force method respectively addressing small and large disturbances. Then, a three-layer neural network is constructed from the data for the purpose of optimal feedback control. A case study is carried out for a microgrid model based on a modified Kundur two-area system to test the real-time performance of the proposed control scheme.
SDSep 1, 2024
LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker DiarizationZengrui Jin, Yifan Yang, Mohan Shi et al.
The evolving speech processing landscape is increasingly focused on complex scenarios like meetings or cocktail parties with multiple simultaneous speakers and far-field conditions. Existing methodologies for addressing these challenges fall into two categories: multi-channel and single-channel solutions. Single-channel approaches, notable for their generality and convenience, do not require specific information about microphone arrays. This paper presents a large-scale far-field overlapping speech dataset, crafted to advance research in speech separation, recognition, and speaker diarization. This dataset is a critical resource for decoding ``Who said What and When'' in multi-talker, reverberant environments, a daunting challenge in the field. Additionally, we introduce a pipeline system encompassing speech separation, recognition, and diarization as a foundational benchmark. Evaluations on the WHAMR! dataset validate the broad applicability of the proposed data.
ITApr 16
On Demand-Private Coded Caching With Multiple DemandsQinyi Lu, Nan Liu, Wei Kang
We consider a coded caching problem with multiple demands under a privacy constraint. In this problem, a server with access to \(N\) files serves \(K\) users over a shared link, and each user requests \(L\) distinct files. The privacy constraint requires that each user obtain no information about the demands of the other users. We propose a new achievable scheme for arbitrary numbers of files and users. The scheme is obtained via a transformation from a non-private coded caching scheme under uncoded placement for \(N\) files and \(K \cdot \min\{N,KL\}\) users, where each user requests one file and the demands are restricted to a subset of all possible demands. We then derive a converse bound, and the proposed scheme is shown to be order optimal within a factor of 6 of this bound.
NAJul 14, 2023
A Surrogate Data Assimilation Model for the Estimation of Dynamical System in a Limited AreaWei Kang, Liang Xu, Hong Zhou
We propose a novel learning-based surrogate data assimilation (DA) model for efficient state estimation in a limited area. Our model employs a feedforward neural network for online computation, eliminating the need for integrating high-dimensional limited-area models. This approach offers significant computational advantages over traditional DA algorithms. Furthermore, our method avoids the requirement of lateral boundary conditions for the limited-area model in both online and offline computations. The design of our surrogate DA model is built upon a robust theoretical framework that leverages two fundamental concepts: observability and effective region. The concept of observability enables us to quantitatively determine the optimal amount of observation data necessary for accurate DA. Meanwhile, the concept of effective region substantially reduces the computational burden associated with computing observability and generating training data.
ASJul 12, 2025Code
ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow MatchingHan Zhu, Wei Kang, Liyong Guo et al.
Generating spoken dialogue is more challenging than monologue text-to-speech (TTS) due to the need for realistic turn-taking and distinct speaker timbres. Existing spoken dialogue generation models, being auto-regressive, suffer from slow and unstable inference. To overcome these limitations, we introduce ZipVoice-Dialog, a non-autoregressive zero-shot spoken dialogue generation model built upon flow matching. Key designs include: 1) speaker-turn embeddings for precise speaker turn-taking; 2) a curriculum learning strategy for stable speech-text alignment; 3) specialized strategies to enable stereo dialogue generation. Additionally, recognizing the lack of open-source large-scale spoken dialogue datasets, we curated OpenDialog, a 6.8k-hour spoken dialogue dataset from in-the-wild speech data. Furthermore, we established a benchmark to comprehensively evaluate various models. Experimental results demonstrate that ZipVoice-Dialog achieves superior performance in intelligibility, speaker turn-taking accuracy, speaker similarity, and inference speed. Our codes, model checkpoints, demo samples, and the OpenDialog dataset are all publicly available at https://github.com/k2-fsa/ZipVoice.
AIJul 8, 2024
Enhanced Safety in Autonomous Driving: Integrating Latent State Diffusion Model for End-to-End NavigationDetian Chu, Linyuan Bai, Jianuo Huang et al.
With the advancement of autonomous driving, ensuring safety during motion planning and navigation is becoming more and more important. However, most end-to-end planning methods suffer from a lack of safety. This research addresses the safety issue in the control optimization problem of autonomous driving, formulated as Constrained Markov Decision Processes (CMDPs). We propose a novel, model-based approach for policy optimization, utilizing a conditional Value-at-Risk based Soft Actor Critic to manage constraints in complex, high-dimensional state spaces effectively. Our method introduces a worst-case actor to guide safe exploration, ensuring rigorous adherence to safety requirements even in unpredictable scenarios. The policy optimization employs the Augmented Lagrangian method and leverages latent diffusion models to predict and simulate future trajectories. This dual approach not only aids in navigating environments safely but also refines the policy's performance by integrating distribution modeling to account for environmental uncertainties. Empirical evaluations conducted in both simulated and real environment demonstrate that our approach outperforms existing methods in terms of safety, efficiency, and decision-making capabilities.
ASMay 19, 2023Code
Blank-regularized CTC for Frame Skipping in Neural TransducerYifan Yang, Xiaoyu Yang, Liyong Guo et al.
Neural Transducer and connectionist temporal classification (CTC) are popular end-to-end automatic speech recognition systems. Due to their frame-synchronous design, blank symbols are introduced to address the length mismatch between acoustic frames and output tokens, which might bring redundant computation. Previous studies managed to accelerate the training and inference of neural Transducers by discarding frames based on the blank symbols predicted by a co-trained CTC. However, there is no guarantee that the co-trained CTC can maximize the ratio of blank symbols. This paper proposes two novel regularization methods to explicitly encourage more blanks by constraining the self-loop of non-blank symbols in the CTC. It is interesting to find that the frame reduction ratio of the neural Transducer can approach the theoretical boundary. Experiments on LibriSpeech corpus show that our proposed method accelerates the inference of neural Transducer by 4 times without sacrificing performance. Our work is open-sourced and publicly available https://github.com/k2-fsa/icefall.
LGSep 25, 2025
EvoMail: Self-Evolving Cognitive Agents for Adaptive Spam and Phishing Email DefenseWei Huang, De-Tian Chu, Lin-Yuan Bai et al.
Modern email spam and phishing attacks have evolved far beyond keyword blacklists or simple heuristics. Adversaries now craft multi-modal campaigns that combine natural-language text with obfuscated URLs, forged headers, and malicious attachments, adapting their strategies within days to bypass filters. Traditional spam detection systems, which rely on static rules or single-modality models, struggle to integrate heterogeneous signals or to continuously adapt, leading to rapid performance degradation. We propose EvoMail, a self-evolving cognitive agent framework for robust detection of spam and phishing. EvoMail first constructs a unified heterogeneous email graph that fuses textual content, metadata (headers, senders, domains), and embedded resources (URLs, attachments). A Cognitive Graph Neural Network enhanced by a Large Language Model (LLM) performs context-aware reasoning across these sources to identify coordinated spam campaigns. Most critically, EvoMail engages in an adversarial self-evolution loop: a ''red-team'' agent generates novel evasion tactics -- such as character obfuscation or AI-generated phishing text -- while the ''blue-team'' detector learns from failures, compresses experiences into a memory module, and reuses them for future reasoning. Extensive experiments on real-world datasets (Enron-Spam, Ling-Spam, SpamAssassin, and TREC) and synthetic adversarial variants demonstrate that EvoMail consistently outperforms state-of-the-art baselines in detection accuracy, adaptability to evolving spam tactics, and interpretability of reasoning traces. These results highlight EvoMail's potential as a resilient and explainable defense framework against next-generation spam and phishing threats.
CLSep 18, 2025
Controlling Language Difficulty in Dialogues with Linguistic FeaturesShuyao Xu, Wenguang Wang, Handong Gao et al.
Large language models (LLMs) have emerged as powerful tools for supporting second language acquisition, particularly in simulating interactive dialogues for speaking practice. However, adapting the language difficulty of LLM-generated responses to match learners' proficiency levels remains a challenge. This work addresses this issue by proposing a framework for controlling language proficiency in educational dialogue systems. Our approach leverages three categories of linguistic features, readability features (e.g., Flesch-Kincaid Grade Level), syntactic features (e.g., syntactic tree depth), and lexical features (e.g., simple word ratio), to quantify and regulate text complexity. We demonstrate that training LLMs on linguistically annotated dialogue data enables precise modulation of language proficiency, outperforming prompt-based methods in both flexibility and stability. To evaluate this, we introduce Dilaprix, a novel metric integrating the aforementioned features, which shows strong correlation with expert judgments of language difficulty. Empirical results reveal that our approach achieves superior controllability of language proficiency while maintaining high dialogue quality.
LGJun 18, 2025
HiPreNets: High-Precision Neural Networks through Progressive TrainingEthan Mulle, Wei Kang, Qi Gong
Deep neural networks are powerful tools for solving nonlinear problems in science and engineering, but training highly accurate models becomes challenging as problem complexity increases. Non-convex optimization and numerous hyperparameters to tune make performance improvement difficult, and traditional approaches often prioritize minimizing mean squared error (MSE) while overlooking $L^{\infty}$ error, which is the critical focus in many applications. To address these challenges, we present a progressive framework for training and tuning high-precision neural networks (HiPreNets). Our approach refines a previously explored staged training technique for neural networks that improves an existing fully connected neural network by sequentially learning its prediction residuals using additional networks, leading to improved overall accuracy. We discuss how to take advantage of the structure of the residuals to guide the choice of loss function, number of parameters to use, and ways to introduce adaptive data sampling techniques. We validate our framework's effectiveness through several benchmark problems.
SPMar 21, 2024
Rolling bearing fault diagnosis method based on generative adversarial enhanced multi-scale convolutional neural network modelMaoxuan Zhou, Wei Kang, Kun He
In order to solve the problem that current convolutional neural networks can not capture the correlation features between the time domain signals of rolling bearings effectively, and the model accuracy is limited by the number and quality of samples, a rolling bearing fault diagnosis method based on generative adversarial enhanced multi-scale convolutional neural network model is proposed. Firstly, Gram angular field coding technique is used to encode the time domain signal of the rolling bearing and generate the feature map to retain the complete information of the vibration signal. Then, the re-sulting data is divided into a training set, a validation set, and a test set. Among them, the training set is input into the gradient penalty Wasserstein distance generation adversarial network to complete the training, and a new sample with similar features to the training sample is obtained, and then the original training set is expanded. Next, multi-scale convolution is used to extract the fault features of the extended training set, and the feature graph is normalized by example to overcome the influence of the difference in feature distribution. Finally, the attention mechanism is applied to the adaptive weighting of normalized features and the extraction of deep features, and the fault diagnosis is completed by the softmax classifier. Compared with ResNet method, the experimental results show that the proposed method has better generalization performance and anti-noise performance.
DSDec 29, 2021
Data-Driven Computational Methods for the Domain of Attraction and Zubov's EquationWei Kang, Kai Sun, Liang Xu
This paper deals with a special type of Lyapunov functions, namely the solution of Zubov's equation. Such a function can be used to characterize the domain of attraction for systems of ordinary differential equations. We derive and prove an integral form solution to Zubov's equation. For numerical computation, we develop two data-driven methods. One is based on the integration of an augmented system of differential equations; and the other one is based on deep learning. The former is effective for systems with a relatively low state space dimension and the latter is developed for high dimensional problems. The deep learning method is applied to a New England 10-generator power system model. We prove that a neural network approximation exists for the Lyapunov function of power systems such that the approximation error is a cubic polynomial of the number of generators. The error convergence rate as a function of n, the number of neurons, is proved.
OCSep 15, 2021
Neural network optimal feedback control with enhanced closed loop stabilityTenavi Nakamura-Zimmerer, Qi Gong, Wei Kang
Recent research has shown that supervised learning can be an effective tool for designing optimal feedback controllers for high-dimensional nonlinear dynamic systems. But the behavior of these neural network (NN) controllers is still not well understood. In this paper we use numerical simulations to demonstrate that typical test accuracy metrics do not effectively capture the ability of an NN controller to stabilize a system. In particular, some NNs with high test accuracy can fail to stabilize the dynamics. To address this we propose two NN architectures which locally approximate a linear quadratic regulator (LQR). Numerical simulations confirm our intuition that the proposed architectures reliably produce stabilizing feedback controllers without sacrificing optimality. In addition, we introduce a preliminary theoretical result describing some stability properties of such NN-controlled systems.
DLDec 15, 2020
On how Cognitive Computing will plan your next Systematic ReviewMaisie Badami, Marcos Baez, Shayan Zamanirad et al.
Systematic literature reviews (SLRs) are at the heart of evidence-based research, setting the foundation for future research and practice. However, producing good quality timely contributions is a challenging and highly cognitive endeavor, which has lately motivated the exploration of automation and support in the SLR process. In this paper we address an often overlooked phase in this process, that of planning literature reviews, and explore under the lenses of cognitive process augmentation how to overcome its most salient challenges. In doing so, we report on the insights from 24 SLR authors on planning practices, its challenges as well as feedback on support strategies inspired by recent advances in cognitive computing. We frame our findings under the cognitive augmentation framework, and report on a prototype implementation and evaluation focusing on further informing the technical feasibility.
LGDec 3, 2020
Neural Network Approximations of Compositional Functions With Applications to Dynamical SystemsWei Kang, Qi Gong
As demonstrated in many areas of real-life applications, neural networks have the capability of dealing with high dimensional data. In the fields of optimal control and dynamical systems, the same capability was studied and verified in many published results in recent years. Towards the goal of revealing the underlying reason why neural networks are capable of solving some high dimensional problems, we develop an algebraic framework and an approximation theory for compositional functions and their neural network approximations. The theoretical foundation is developed in a way so that it supports the error analysis for not only functions as input-output relations, but also numerical algorithms. This capability is critical because it enables the analysis of approximation errors for problems for which analytic solutions are not available, such as differential equations and optimal control. We identify a set of key features of compositional functions and the relationship between the features and the complexity of neural networks. In addition to function approximations, we prove several formulae of error upper bounds for neural networks that approximate the solutions to differential equations, optimization, and optimal control.
OCSep 11, 2020
QRnet: optimal regulator design with LQR-augmented neural networksTenavi Nakamura-Zimmerer, Qi Gong, Wei Kang
In this paper we propose a new computational method for designing optimal regulators for high-dimensional nonlinear systems. The proposed approach leverages physics-informed machine learning to solve high-dimensional Hamilton-Jacobi-Bellman equations arising in optimal feedback control. Concretely, we augment linear quadratic regulators with neural networks to handle nonlinearities. We train the augmented models on data generated without discretizing the state space, enabling application to high-dimensional problems. We use the proposed method to design a candidate optimal regulator for an unstable Burgers' equation, and through this example, demonstrate improved robustness and accuracy compared to existing neural network formulations.
DSNov 21, 2019
Density Propagation with Characteristics-based Deep LearningTenavi Nakamura-Zimmerer, Daniele Venturi, Qi Gong et al.
Uncertainty propagation in nonlinear dynamic systems remains an outstanding problem in scientific computing and control. Numerous approaches have been developed, but are limited in their capability to tackle problems with more than a few uncertain variables or require large amounts of simulation data. In this paper, we propose a data-driven method for approximating joint probability density functions (PDFs) of nonlinear dynamic systems with initial condition and parameter uncertainty. Our approach leverages on the power of deep learning to deal with high-dimensional inputs, but we overcome the need for huge quantities of training data by encoding PDF evolution equations directly into the optimization problem. We demonstrate the potential of the proposed method by applying it to evaluate the robustness of a feedback controller for a six-dimensional rigid body with parameter uncertainty.
OCJul 11, 2019
Adaptive Deep Learning for High-Dimensional Hamilton-Jacobi-Bellman EquationsTenavi Nakamura-Zimmerer, Qi Gong, Wei Kang
Computing optimal feedback controls for nonlinear systems generally requires solving Hamilton-Jacobi-Bellman (HJB) equations, which are notoriously difficult when the state dimension is large. Existing strategies for high-dimensional problems often rely on specific, restrictive problem structures, or are valid only locally around some nominal trajectory. In this paper, we propose a data-driven method to approximate semi-global solutions to HJB equations for general high-dimensional nonlinear systems and compute candidate optimal feedback controls in real-time. To accomplish this, we model solutions to HJB equations with neural networks (NNs) trained on data generated without discretizing the state space. Training is made more effective and data-efficient by leveraging the known physics of the problem and using the partially-trained NN to aid in adaptive data generation. We demonstrate the effectiveness of our method by learning solutions to HJB equations corresponding to the attitude control of a six-dimensional nonlinear rigid body, and nonlinear systems of dimension up to 30 arising from the stabilization of a Burgers'-type partial differential equation. The trained NNs are then used for real-time feedback control of these systems.
SYJun 20, 2017
Relative and Mean Motions of Multi-Machine Power Systems in Classical ModelBin Wang, Kai Sun, Wei Kang
It is well-known that in an m-machine power system where each machine is represented by a second-order differential equation, the Jacobian of the system equation contains (m-1) pairs of conjugate eigenvalues and two real eigenvalues, including at least one zero. This letter proves that under the uniform damping condition, the dynamics associated with the two real eigenvalues do not have any impact on the dynamics associated with those complex eigenvalues. This conclusion is important to justify the use of the relative motions or center-of-inertia (COI) coordinate to analyze the rotor angle stability in a multi-machine power system.
SYNov 14, 2016
Nonlinear Modal Decoupling of Multi-Oscillator Systems with Applications to Power SystemsBin Wang, Kai Sun, Wei Kang
Many natural and manmade dynamical systems that are modeled as large nonlinear multi-oscillator systems like power systems are hard to analyze. For such a system, we propose a nonlinear modal decoupling (NMD) approach inversely constructing as many decoupled nonlinear oscillators as the system oscillation modes so that individual decoupled oscillators can easily be analyzed to infer dynamics and stability of the original system. The NMD follows a similar idea to the normal form except that we eliminate inter-modal terms but allow intra-modal terms of desired nonlinearities in decoupled systems, so decoupled systems can flexibly be shaped into desired forms of nonlinear oscillators. The NMD is then applied to power systems towards two types of nonlinear oscillators, i.e. the single-machine-infinite-bus (SMIB) systems and a proposed non-SMIB oscillator. Numerical studies on a 3-machine 9-bus system and New England 10-machine 39-bus system show that (i) decoupled oscillators keep a majority of the original system modal nonlinearities and the NMD provides a bigger validity region than the normal form, and (ii) decoupled non-SMIB oscillators may keep more authentic dynamics of the original system than decoupled SMIB systems.
ITAug 25, 2014
Compressing Encrypted Data and Permutation CipherWei Kang, Nan Liu
In a system that performs both encryption and lossy compression, the conventional way is to compress first and then encrypt the compressed data. This separation approach proves to be optimal. In certain applications where sensitive information should be protected as early as possible, it is preferable to perform encryption first and then compress the encrypted data, which leads to the concept of the reversed system. Johnson et al. proposed an achievability scheme for the reversed system that has a modulo-sum encryption followed by a compression using Wyner-Ziv distributed source coding with side information. However, this reversed system performs worse than the conventional system in the sense that it requires more compression rate and secrecy key rate. In this paper, we propose a new achievability scheme for the reverse system where encryption is conducted by a permutation cipher and then the encrypted data is compressed using the optimal rate-distortion code. The proposed scheme can achieve the optimal compression rate and secret key rate, and therefore shows that reversing the order of encryption and compression does not necessarily compromise the performance of an encryption-compression system. The proposed system attains weak secrecy, and we show that the information leakage is mainly contributed by the type information of the sequence, which is not concealed by the permutation cipher. Given the type of the sequence, the rest of the information leakage vanishes exponentially.
ITJun 12, 2014
Deception with Side Information in Biometric Authentication SystemsWei Kang, Daming Cao, Nan Liu
In this paper, we study the probability of successful deception of an uncompressed biometric authentication system with side information at the adversary. It represents the scenario where the adversary may have correlated side information, e.g.,~a partial finger print or a DNA sequence of a relative of the legitimate user. We find the optimal exponent of the deception probability by proving both the achievability and the converse. Our proofs are based on the connection between the problem of deception with side information and the rate distortion problem with side information at both the encoder and decoder.