Qiang Du

LG
h-index5
26papers
559citations
Novelty58%
AI Score43

26 Papers

NAFeb 13, 2019
Maximum principle preserving exponential time differencing schemes for the nonlocal Allen-Cahn equation

Qiang Du, Lili Ju, Xiao Li et al.

The nonlocal Allen-Cahn (NAC) equation is a generalization of the classic Allen-Cahn equation by replacing the Laplacian with a parameterized nonlocal diffusion operator, and satisfies the maximum principle as its local counterpart. In this paper, we develop and analyze first and second order exponential time differencing (ETD) schemes for solving the NAC equation, which unconditionally preserve the discrete maximum principle. The fully discrete numerical schemes are obtained by applying the stabilized ETD approximations for time integration with the quadrature-based finite difference discretization in space. We derive their respective optimal maximum-norm error estimates and further show that the proposed schemes are asymptotically compatible, i.e., the approximate solutions always converge to the classic Allen-Cahn solution when the horizon, the spatial mesh size and the time step size go to zero. We also prove that the schemes are energy stable in the discrete sense. Various experiments are performed to verify these theoretical results and to investigate numerically the relation between the discontinuities and the nonlocal parameters.

OCDec 10, 2020
A Game-Theoretic Framework for Autonomous Vehicles Velocity Control: Bridging Microscopic Differential Games and Macroscopic Mean Field Games

Kuang Huang, Xuan Di, Qiang Du et al.

This paper proposes an efficient computational framework for longitudinal velocity control of a large number of autonomous vehicles (AVs) and develops a traffic flow theory for AVs. Instead of hypothesizing explicitly how AVs drive, our goal is to design future AVs as rational, utility-optimizing agents that continuously select optimal velocity over a period of planning horizon. With a large number of interacting AVs, this design problem can become computationally intractable. This paper aims to tackle such a challenge by employing mean field approximation and deriving a mean field game (MFG) as the limiting differential game with an infinite number of agents. The proposed micro-macro model allows one to define individuals on a microscopic level as utility-optimizing agents while translating rich microscopic behaviors to macroscopic models. Different from existing studies on the application of MFG to traffic flow models, the present study offers a systematic framework to apply MFG to autonomous vehicle velocity control. The MFG-based AV controller is shown to mitigate traffic jam faster than the LWR-based controller. MFG also embodies classical traffic flow models with behavioral interpretation, thereby providing a new traffic flow theory for AVs.

NADec 4, 2017
A quasinonlocal coupling method for nonlocal and local diffusion models

Qiang Du, Xingjie Helen Li, Jianfeng Lu et al.

In this paper, we extend the idea of "geometric reconstruction" to couple a nonlocal diffusion model directly with the classical local diffusion in one dimensional space. This new coupling framework removes interfacial inconsistency, ensures the flux balance, and satisfies energy conservation as well as the maximum principle, whereas none of existing coupling methods for nonlocal-to-local coupling satisfies all of these properties. We establish the well-posedness and provide the stability analysis of the coupling method. We investigate the difference to the local limiting problem in terms of the nonlocal interaction range. Furthermore, we propose a first order finite difference numerical discretization and perform several numerical tests to confirm the theoretical findings. In particular, we show that the resulting numerical result is free of artifacts near the boundary of the domain where a classical local boundary condition is used, together with a coupled fully nonlocal model in the interior of the domain.

COMP-PHOct 14, 2017
Stability of nonlocal Dirichlet integrals and implications for peridynamic correspondence material modeling

Qiang Du, Xiaochuan Tian

Nonlocal gradient operators are basic elements of nonlocal vector calculus that play important roles in nonlocal modeling and analysis. In this work, we extend earlier analysis on nonlocal gradient operators. In particular, we study a nonlocal Dirichlet integral that is given by a quadratic energy functional based on nonlocal gradients. Our main finding, which differs from claims made in previous studies, is that the coercivity and stability of this nonlocal continuum energy functional may hold for some properly chosen nonlocal interaction kernels but may fail for some other ones. This can be significant for possible applications of nonlocal gradient operators in various nonlocal models. In particular, we discuss some important implications for the peridynamic correspondence material models.

APNov 26, 2016
Nonlocal conservation laws. I. A new class of monotonicity-preserving models

Qiang Du, Zhan Huang, Philippe G. LeFloch

We introduce a new class of nonlocal nonlinear conservation laws in one space dimension that allow for nonlocal interactions over a finite horizon. The proposed model, which we refer to as the nonlocal pair interaction model, inherits at the continuum level the unwinding feature of finite difference schemes for local hyperbolic conservation laws, so that the maximum principle and certain monotonicity properties hold and, consequently, the entropy inequalities are naturally satisfied. We establish a global-in-time well-posedness theory for these models which covers a broad class of initial data. Moreover, in the limit when the horizon parameter approaches zero, we are able to prove that our nonlocal model reduces to the conventional class of local hyperbolic conservation laws. Furthermore, we propose a numerical discretization method adapted to our nonlocal model, which relies on a monotone numerical flux and a uniform mesh, and we establish that these numerical solutions converge to a solution, providing as by-product both the existence theory for the nonlocal model and the convergence property relating the nonlocal regime and the asymptotic local regime.

NAFeb 24, 2019
A conforming DG method for linear nonlocal models with integrable kernels

Qiang Du, Xiaobo Yin

Numerical solution of nonlocal constrained value problems with integrable kernels are considered. These nonlocal problems arise in nonlocal mechanics and nonlocal diffusion. The structure of the true solution to the problem is analyzed first. The analysis leads naturally to a new kind of discontinuous Galerkin method that efficiently solve the problem numerically. This method is shown to be asymptotically compatible. Moreover, it has optimal convergence rate for one dimensional case under very weak assumptions, and almost optimal convergence rate for two dimensional case under mild assumptions.

NAJun 8, 2018
A spectral method for nonlocal diffusion operators on the sphere

Richard Mikael Slevinsky, Hadrien Montanelli, Qiang Du

We present algorithms for solving spatially nonlocal diffusion models on the unit sphere with spectral accuracy in space. Our algorithms are based on the diagonalizability of nonlocal diffusion operators in the basis of spherical harmonics, the computation of their eigenvalues to high relative accuracy using quadrature and asymptotic formulas, and a fast spherical harmonic transform. These techniques also lead to an efficient implementation of high-order exponential integrators for time-dependent models. We apply our method to the nonlocal Poisson, Allen--Cahn and Brusselator equations.

NADec 5, 2017
A New Phase-Field Approach to Variational Implicit Solvation of Charged Molecules with the Coulomb-Field Approximation

Yanxiang Zhao, Yanping Ma, Hui Sun et al.

We construct a new phase-field model for the solvation of charged molecules with a variational implicit solvent. Our phase-field free-energy functional includes the surface energy, solute-solvent van der Waals dispersion energy, and electrostatic interaction energy that is described by the Coulomb-field approximation, all coupled together self-consistently through a phase field. By introducing a new phase-field term in the description of the solute-solvent van der Waals and electrostatic interactions, we can keep the phase-field values closer to those describing the solute and solvent regions, respectively, making it more accurate in the free-energy estimate. We first prove that our phase-field functionals $Γ$-converge to the corresponding sharp-interface limit. We then develop and implement an efficient and stable numerical method to solve the resulting gradient-flow equation to obtain equilibrium conformations and their associated free energies of the underlying charged molecular system. Our numerical method combines a linear splitting scheme, spectral discretization, and exponential time differencing Runge-Kutta approximations. Applications to the solvation of single ions and a two-plate system demonstrate that our new phase-field implementation improves the previous ones by achieving the localization of the system forces near the solute-solvent interface and maintaining more robustly the desirable hyperbolic tangent profile for even larger interfacial width. This work provides a scheme to resolve the possible unphysical feature of negative values in the phase-field function found in the previous phase-field modeling (cf. H. Sun, et al. J. Chem. Phys., 2015) of charged molecules with the Poisson--Boltzmann equation for the electrostatic interaction.

NAAug 13, 2018
New error bounds for deep networks using sparse grids

Hadrien Montanelli, Qiang Du

We prove a theorem concerning the approximation of multivariate functions by deep ReLU networks. We present new error estimates for which the curse of the dimensionality is lessened by establishing a connection with sparse grids.

NAJul 21, 2024
Computational and analytical studies of a new nonlocal phase-field crystal model in two dimensions

Qiang Du, Kai Wang, Jiang Yang

A nonlocal phase-field crystal (NPFC) model is presented as a nonlocal counterpart of the local phase-field crystal (LPFC) model and a special case of the structural PFC (XPFC) derived from classical field theory for crystal growth and phase transition. The NPFC incorporates a finite range of spatial nonlocal interactions that can account for both repulsive and attractive effects. The specific form is data-driven and determined by a fitting to the materials structure factor, which can be much more accurate than the LPFC and previously proposed fractional variant. In particular, it is able to match the experimental data of the structure factor up to the second peak, an achievement not possible with other PFC variants studied in the literature. Both LPFC and fractional PFC (FPFC) are also shown to be distinct scaling limits of the NPFC, which reflects the generality. The advantage of NPFC in retaining material properties suggests that it may be more suitable for characterizing liquid-solid transition systems. Moreover, we study numerical discretizations using Fourier spectral methods, which are shown to be convergent and asymptotically compatible, making them robust numerical discretizations across different parameter ranges. Numerical experiments are given in the two-dimensional case to demonstrate the effectiveness of the NPFC in simulating crystal structures and grain boundaries.

FLU-DYNMar 10
Flow Field Reconstruction via Voronoi-Enhanced Physics-Informed Neural Networks with End-to-End Sensor Placement Optimization

Renjie Xiao, Bingteng Sun, Yiling Chen et al.

(short version abstract, full in article)High-fidelity flow field reconstruction is important in fluid dynamics, but it is challenged by sparse and spatiotemporally incomplete sensor measurements, as well as failures of pre-deployed measurement points that can invalidate pre-trained reconstruction models. Physics-informed neural networks (PINNs) alleviate dependence on large labeled datasets by incorporating governing physics, yet sensor placement optimization, a key factor in reconstruction accuracy and robustness, remains underexplored. In this study, we propose a PINN with Voronoi-enhanced Sensor Optimization (VSOPINN). VSOPINN enables differentiable soft Voronoi construction for sparse sensor data rasterization, end-to-end fusion of centroidal Voronoi tessellation (CVT) with PINNs for adaptive sensor placement, and unified layout optimization for multi-condition flow reconstruction through a shared encoder-multi-decoder architecture. We validate VSOPINN on three representative problems: lid-driven cavity flow, vascular flow, and annular rotating flow. Results show that VSOPINN significantly improves reconstruction accuracy across different Reynolds numbers, adaptively learns effective sensor layouts, and remains robust under partial sensor failure. The study clarifies the intrinsic relationship between sensor placement and reconstruction precision in PINN-based flow field reconstruction.

STDec 29, 2024
A Particle Algorithm for Mean-Field Variational Inference

Qiang Du, Kaizheng Wang, Edith Zhang et al.

Variational inference is a fast and scalable alternative to Markov chain Monte Carlo and has been widely applied to posterior inference tasks in statistics and machine learning. A traditional approach for implementing mean-field variational inference (MFVI) is coordinate ascent variational inference (CAVI), which relies crucially on parametric assumptions on complete conditionals. In this paper, we introduce a novel particle-based algorithm for mean-field variational inference, which we term PArticle VI (PAVI). Notably, our algorithm does not rely on parametric assumptions on complete conditionals, and it applies to the nonparametric setting. We provide non-asymptotic finite-particle convergence guarantee for our algorithm. To our knowledge, this is the first end-to-end guarantee for particle-based MFVI.

OCMar 2, 2025
DualMS: Implicit Dual-Channel Minimal Surface Optimization for Heat Exchanger Design

Weizheng Zhang, Hao Pan, Lin Lu et al.

Heat exchangers are critical components in a wide range of engineering applications, from energy systems to chemical processing, where efficient thermal management is essential. The design objectives for heat exchangers include maximizing the heat exchange rate while minimizing the pressure drop, requiring both a large interface area and a smooth internal structure. State-of-the-art designs, such as triply periodic minimal surfaces (TPMS), have proven effective in optimizing heat exchange efficiency. However, TPMS designs are constrained by predefined mathematical equations, limiting their adaptability to freeform boundary shapes. Additionally, TPMS structures do not inherently control flow directions, which can lead to flow stagnation and undesirable pressure drops. This paper presents DualMS, a novel computational framework for optimizing dual-channel minimal surfaces specifically for heat exchanger designs in freeform shapes. To the best of our knowledge, this is the first attempt to directly optimize minimal surfaces for two-fluid heat exchangers, rather than relying on TPMS. Our approach formulates the heat exchange maximization problem as a constrained connected maximum cut problem on a graph, with flow constraints guiding the optimization process. To address undesirable pressure drops, we model the minimal surface as a classification boundary separating the two fluids, incorporating an additional regularization term for area minimization. We employ a neural network that maps spatial points to binary flow types, enabling it to classify flow skeletons and automatically determine the surface boundary. DualMS demonstrates greater flexibility in surface topology compared to TPMS and achieves superior thermal performance, with lower pressure drops while maintaining a similar heat exchange rate under the same material cost.

NAOct 14, 2024
Which Spaces can be Embedded in $L_p$-type Reproducing Kernel Banach Space? A Characterization via Metric Entropy

Yiping Lu, Daozhe Lin, Qiang Du

In this paper, we establish a novel connection between the metric entropy growth and the embeddability of function spaces into reproducing kernel Hilbert/Banach spaces. Metric entropy characterizes the information complexity of function spaces and has implications for their approximability and learnability. Classical results show that embedding a function space into a reproducing kernel Hilbert space (RKHS) implies a bound on its metric entropy growth. Surprisingly, we prove a \textbf{converse}: a bound on the metric entropy growth of a function space allows its embedding to a $L_p-$type Reproducing Kernel Banach Space (RKBS). This shows that the ${L}_p-$type RKBS provides a broad modeling framework for learnable function classes with controlled metric entropies. Our results shed new light on the power and limitations of kernel methods for learning complex function spaces.

DSDec 20, 2021
Discovering State Variables Hidden in Experimental Data

Boyuan Chen, Kuang Huang, Sunand Raghupathi et al.

All physical laws are described as relationships between state variables that give a complete and non-redundant description of the relevant system dynamics. However, despite the prevalence of computing power and AI, the process of identifying the hidden state variables themselves has resisted automation. Most data-driven methods for modeling physical phenomena still assume that observed data streams already correspond to relevant state variables. A key challenge is to identify the possible sets of state variables from scratch, given only high-dimensional observational data. Here we propose a new principle for determining how many state variables an observed system is likely to have, and what these variables might be, directly from video streams. We demonstrate the effectiveness of this approach using video recordings of a variety of physical dynamical systems, ranging from elastic double pendulums to fire flames. Without any prior knowledge of the underlying physics, our algorithm discovers the intrinsic dimension of the observed dynamics and identifies candidate sets of state variables. We suggest that this approach could help catalyze the understanding, prediction and control of increasingly complex systems. Project website is at: https://www.cs.columbia.edu/~bchen/neural-state-variables

LGJun 6, 2021
A Physics-Informed Deep Learning Paradigm for Traffic State and Fundamental Diagram Estimation

Rongye Shi, Zhaobin Mo, Kuang Huang et al.

Traffic state estimation (TSE) bifurcates into two categories, model-driven and data-driven (e.g., machine learning, ML), while each suffers from either deficient physics or small data. To mitigate these limitations, recent studies introduced a hybrid paradigm, physics-informed deep learning (PIDL), which contains both model-driven and data-driven components. This paper contributes an improved version, called physics-informed deep learning with a fundamental diagram learner (PIDL+FDL), which integrates ML terms into the model-driven component to learn a functional form of a fundamental diagram (FD), i.e., a mapping from traffic density to flow or velocity. The proposed PIDL+FDL has the advantages of performing the TSE learning, model parameter identification, and FD estimation simultaneously. We demonstrate the use of PIDL+FDL to solve popular first-order and second-order traffic flow models and reconstruct the FD relation as well as model parameters that are outside the FD terms. We then evaluate the PIDL+FDL-based TSE using the Next Generation SIMulation (NGSIM) dataset. The experimental results show the superiority of the PIDL+FDL in terms of improved estimation accuracy and data efficiency over advanced baseline TSE methods, and additionally, the capacity to properly learn the unknown underlying FD relation.

NAMar 21, 2021
The Discovery of Dynamics via Linear Multistep Methods and Deep Learning: Error Estimation

Qiang Du, Yiqi Gu, Haizhao Yang et al.

Identifying hidden dynamics from observed data is a significant and challenging task in a wide range of applications. Recently, the combination of linear multistep methods (LMMs) and deep learning has been successfully employed to discover dynamics, whereas a complete convergence analysis of this approach is still under development. In this work, we consider the deep network-based LMMs for the discovery of dynamics. We put forward error estimates for these methods using the approximation property of deep networks. It indicates, for certain families of LMMs, that the $\ell^2$ grid error is bounded by the sum of $O(h^p)$ and the network approximation error, where $h$ is the time step size and $p$ is the local truncation error order. Numerical results of several physically relevant examples are provided to demonstrate our theory.

LGJan 17, 2021
Physics-Informed Deep Learning for Traffic State Estimation

Rongye Shi, Zhaobin Mo, Kuang Huang et al.

Traffic state estimation (TSE), which reconstructs the traffic variables (e.g., density) on road segments using partially observed data, plays an important role on efficient traffic control and operation that intelligent transportation systems (ITS) need to provide to people. Over decades, TSE approaches bifurcate into two main categories, model-driven approaches and data-driven approaches. However, each of them has limitations: the former highly relies on existing physical traffic flow models, such as Lighthill-Whitham-Richards (LWR) models, which may only capture limited dynamics of real-world traffic, resulting in low-quality estimation, while the latter requires massive data in order to perform accurate and generalizable estimation. To mitigate the limitations, this paper introduces a physics-informed deep learning (PIDL) framework to efficiently conduct high-quality TSE with small amounts of observed data. PIDL contains both model-driven and data-driven components, making possible the integration of the strong points of both approaches while overcoming the shortcomings of either. This paper focuses on highway TSE with observed data from loop detectors, using traffic density as the traffic variables. We demonstrate the use of PIDL to solve (with data from loop detectors) two popular physical traffic flow models, i.e., Greenshields-based LWR and three-parameter-based LWR, and discover the model parameters. We then evaluate the PIDL-based highway TSE using the Next Generation SIMulation (NGSIM) dataset. The experimental results show the advantages of the PIDL-based approach in terms of estimation accuracy and data efficiency over advanced baseline TSE methods.

SPApr 13, 2020
A non-cooperative meta-modeling game for automated third-party calibrating, validating, and falsifying constitutive laws with parallelized adversarial attacks

Kun Wang, WaiChing Sun, Qiang Du

The evaluation of constitutive models, especially for high-risk and high-regret engineering applications, requires efficient and rigorous third-party calibration, validation and falsification. While there are numerous efforts to develop paradigms and standard procedures to validate models, difficulties may arise due to the sequential, manual and often biased nature of the commonly adopted calibration and validation processes, thus slowing down data collections, hampering the progress towards discovering new physics, increasing expenses and possibly leading to misinterpretations of the credibility and application ranges of proposed models. This work attempts to introduce concepts from game theory and machine learning techniques to overcome many of these existing difficulties. We introduce an automated meta-modeling game where two competing AI agents systematically generate experimental data to calibrate a given constitutive model and to explore its weakness, in order to improve experiment design and model robustness through competition. The two agents automatically search for the Nash equilibrium of the meta-modeling game in an adversarial reinforcement learning framework without human intervention. By capturing all possible design options of the laboratory experiments into a single decision tree, we recast the design of experiments as a game of combinatorial moves that can be resolved through deep reinforcement learning by the two competing players. Our adversarial framework emulates idealized scientific collaborations and competitions among researchers to achieve a better understanding of the application range of the learned material laws and prevent misinterpretations caused by conventional AI-based third-party validation.

NADec 29, 2019
Discovery of Dynamics Using Linear Multistep Methods

Rachael Keller, Qiang Du

Linear multistep methods (LMMs) are popular time discretization techniques for the numerical solution of differential equations. Traditionally they are applied to solve for the state given the dynamics (the forward problem), but here we consider their application for learning the dynamics given the state (the inverse problem). This repurposing of LMMs is largely motivated by growing interest in data-driven modeling of dynamics, but the behavior and analysis of LMMs for discovery turn out to be significantly different from the well-known, existing theory for the forward problem. Assuming a highly idealized setting of being given the exact state with a zero residual of the discrete dynamics, we establish for the first time a rigorous framework based on refined notions of consistency and stability to yield convergence using LMMs for discovery. When applying these concepts to three popular $M-$step LMMs, the Adams-Bashforth, Adams-Moulton, and Backwards Differentiation Formula schemes, the new theory suggests that Adams-Bashforth for $M$ ranging from $1$ and $6$, Adams-Moulton for $M=0$ and $M=1$, and Backwards Differentiation Formula for all positive $M$ are convergent, and, otherwise, the methods are not convergent in general. In addition, we provide numerical experiments to both motivate and substantiate our theoretical analysis.

LGMar 8, 2019
A cooperative game for automated learning of elasto-plasticity knowledge graphs and models with AI-guided experimentation

Kun Wang, WaiChing Sun, Qiang Du

We introduce a multi-agent meta-modeling game to generate data, knowledge, and models that make predictions on constitutive responses of elasto-plastic materials. We introduce a new concept from graph theory where a modeler agent is tasked with evaluating all the modeling options recast as a directed multigraph and find the optimal path that links the source of the directed graph (e.g. strain history) to the target (e.g. stress) measured by an objective function. Meanwhile, the data agent, which is tasked with generating data from real or virtual experiments (e.g. molecular dynamics, discrete element simulations), interacts with the modeling agent sequentially and uses reinforcement learning to design new experiments to optimize the prediction capacity. Consequently, this treatment enables us to emulate an idealized scientific collaboration as selections of the optimal choices in a decision tree search done automatically via deep reinforcement learning.

NAApr 23, 2019
The Phase Field Method for Geometric Moving Interfaces and Their Numerical Approximations

Qiang Du, Xiaobing Feng

This paper surveys recent numerical advances in the phase field method for geometric surface evolution and related geometric nonlinear partial differential equations (PDEs). Instead of describing technical details of various numerical methods and their analyses, the paper presents a holistic overview about the main ideas of phase field modeling, its mathematical foundation, and relationships between the phase field formalism and other mathematical formalisms for geometric moving interface problems, as well as the current state-of-the-art of numerical approximations of various phase field models with an emphasis on discussing the main ideas of numerical analysis techniques. The paper also reviews recent development on adaptive grid methods and various applications of the phase field modeling and their numerical methods in materials science, fluid mechanics, biology and image science.

LGDec 1, 2018
Stochastic Training of Residual Networks: a Differential Equation Viewpoint

Qi Sun, Yunzhe Tao, Qiang Du

During the last few years, significant attention has been paid to the stochastic training of artificial neural networks, which is known as an effective regularization approach that helps improve the generalization capability of trained models. In this work, the method of modified equations is applied to show that the residual network and its variants with noise injection can be regarded as weak approximations of stochastic differential equations. Such observations enable us to bridge the stochastic training processes with the optimal control of backward Kolmogorov's equations. This not only offers a novel perspective on the effects of regularization from the loss landscape viewpoint but also sheds light on the design of more reliable and efficient stochastic training strategies. As an example, we propose a new way to utilize Bernoulli dropout within the plain residual network architecture and conduct experiments on a real-world image classification task to substantiate our theoretical findings.

LGJun 2, 2018
Hierarchical Attention-Based Recurrent Highway Networks for Time Series Prediction

Yunzhe Tao, Lin Ma, Weizhong Zhang et al.

Time series prediction has been studied in a variety of domains. However, it is still challenging to predict future series given historical observations and past exogenous data. Existing methods either fail to consider the interactions among different components of exogenous variables which may affect the prediction accuracy, or cannot model the correlations between exogenous data and target data. Besides, the inherent temporal dynamics of exogenous data are also related to the target series prediction, and thus should be considered as well. To address these issues, we propose an end-to-end deep learning model, i.e., Hierarchical attention-based Recurrent Highway Network (HRHN), which incorporates spatio-temporal feature extraction of exogenous variables and temporal dynamics modeling of target variables into a single framework. Moreover, by introducing the hierarchical attention mechanism, HRHN can adaptively select the relevant exogenous features in different semantic levels. We carry out comprehensive empirical evaluations with various methods over several datasets, and show that HRHN outperforms the state of the arts in time series prediction, especially in capturing sudden changes and sudden oscillations of time series.

LGJun 2, 2018
Nonlocal Neural Networks, Nonlocal Diffusion and Nonlocal Modeling

Yunzhe Tao, Qi Sun, Qiang Du et al.

Nonlocal neural networks have been proposed and shown to be effective in several computer vision tasks, where the nonlocal operations can directly capture long-range dependencies in the feature space. In this paper, we study the nature of diffusion and damping effect of nonlocal networks by doing spectrum analysis on the weight matrices of the well-trained networks, and then propose a new formulation of the nonlocal block. The new block not only learns the nonlocal interactions but also has stable dynamics, thus allowing deeper nonlocal structures. Moreover, we interpret our formulation from the general nonlocal modeling perspective, where we make connections between the proposed nonlocal network and other nonlocal models, such as nonlocal diffusion process and Markov jump process.

CLMay 9, 2018
A Reinforced Topic-Aware Convolutional Sequence-to-Sequence Model for Abstractive Text Summarization

Li Wang, Junlin Yao, Yunzhe Tao et al.

In this paper, we propose a deep learning approach to tackle the automatic summarization tasks by incorporating topic information into the convolutional sequence-to-sequence (ConvS2S) model and using self-critical sequence training (SCST) for optimization. Through jointly attending to topics and word-level alignment, our approach can improve coherence, diversity, and informativeness of generated summaries via a biased probability generation mechanism. On the other hand, reinforcement training, like SCST, directly optimizes the proposed model with respect to the non-differentiable metric ROUGE, which also avoids the exposure bias during inference. We carry out the experimental evaluation with state-of-the-art methods over the Gigaword, DUC-2004, and LCSTS datasets. The empirical results demonstrate the superiority of our proposed method in the abstractive summarization.