CVJul 16, 2022Code
Clover: Towards A Unified Video-Language Alignment and Fusion ModelJingjia Huang, Yinan Li, Jiashi Feng et al.
Building a universal Video-Language model for solving various video understanding tasks (\emph{e.g.}, text-video retrieval, video question answering) is an open challenge to the machine learning field. Towards this goal, most recent works build the model by stacking uni-modal and cross-modal feature encoders and train it with pair-wise contrastive pre-text tasks. Though offering attractive generality, the resulted models have to compromise between efficiency and performance. They mostly adopt different architectures to deal with different downstream tasks. We find this is because the pair-wise training cannot well \emph{align} and \emph{fuse} features from different modalities. We then introduce \textbf{Clover}\textemdash a Correlated Video-Language pre-training method\textemdash towards a universal Video-Language model for solving multiple video understanding tasks with neither performance nor efficiency compromise. It improves cross-modal feature alignment and fusion via a novel tri-modal alignment pre-training task. Additionally, we propose to enhance the tri-modal alignment via incorporating learning from semantic masked samples and a new pair-wise ranking loss. Clover establishes new state-of-the-arts on multiple downstream tasks, including three retrieval tasks for both zero-shot and fine-tuning settings, and eight video question answering tasks. Codes and pre-trained models will be released at \url{https://github.com/LeeYN-43/Clover}.
CVOct 27, 2023Code
Semi-Supervised Panoptic Narrative GroundingDanni Yang, Jiayi Ji, Xiaoshuai Sun et al.
Despite considerable progress, the advancement of Panoptic Narrative Grounding (PNG) remains hindered by costly annotations. In this paper, we introduce a novel Semi-Supervised Panoptic Narrative Grounding (SS-PNG) learning scheme, capitalizing on a smaller set of labeled image-text pairs and a larger set of unlabeled pairs to achieve competitive performance. Unlike visual segmentation tasks, PNG involves one pixel belonging to multiple open-ended nouns. As a result, existing multi-class based semi-supervised segmentation frameworks cannot be directly applied to this task. To address this challenge, we first develop a novel SS-PNG Network (SS-PNG-NW) tailored to the SS-PNG setting. We thoroughly investigate strategies such as Burn-In and data augmentation to determine the optimal generic configuration for the SS-PNG-NW. Additionally, to tackle the issue of imbalanced pseudo-label quality, we propose a Quality-Based Loss Adjustment (QLA) approach to adjust the semi-supervised objective, resulting in an enhanced SS-PNG-NW+. Employing our proposed QLA, we improve BCE Loss and Dice loss at pixel and mask levels, respectively. We conduct extensive experiments on PNG datasets, with our SS-PNG-NW+ demonstrating promising results comparable to fully-supervised models across all data ratios. Remarkably, our SS-PNG-NW+ outperforms fully-supervised models with only 30% and 50% supervision data, exceeding their performance by 0.8% and 1.1% respectively. This highlights the effectiveness of our proposed SS-PNG-NW+ in overcoming the challenges posed by limited annotations and enhancing the applicability of PNG tasks. The source code is available at https://github.com/nini0919/SSPNG.
MLApr 18, 2022
Adaptive Noisy Data Augmentation for Regularized Estimation and Inference in Generalized Linear ModelsYinan Li, Fang Liu
We propose the AdaPtive Noise Augmentation (PANDA) procedure to regularize the estimation and inference of generalized linear models (GLMs). PANDA iteratively optimizes the objective function given noise augmented data until convergence to obtain the regularized model estimates. The augmented noises are designed to achieve various regularization effects, including $l_0$, bridge (lasso and ridge included), elastic net, adaptive lasso, and SCAD, as well as group lasso and fused ridge. We examine the tail bound of the noise-augmented loss function and establish the almost sure convergence of the noise-augmented loss function and its minimizer to the expected penalized loss function and its minimizer, respectively. We derive the asymptotic distributions for the regularized parameters, based on which, inferences can be obtained simultaneously with variable selection. PANDA exhibits ensemble learning behaviors that help further decrease the generalization error. Computationally, PANDA is easy to code, leveraging existing software for implementing GLMs, without resorting to complicated optimization techniques. We demonstrate the superior or similar performance of PANDA against the existing approaches of the same type of regularizers in simulated and real-life data. We show that the inferences through PANDA achieve nominal or near-nominal coverage and are far more efficient compared to a popular existing post-selection procedure.
QUANT-PHOct 11, 2023
Non-asymptotic Approximation Error Bounds of Parameterized Quantum CircuitsZhan Yu, Qiuhao Chen, Yuling Jiao et al.
Parameterized quantum circuits (PQCs) have emerged as a promising approach for quantum neural networks. However, understanding their expressive power in accomplishing machine learning tasks remains a crucial question. This paper investigates the expressivity of PQCs for approximating general multivariate function classes. Unlike previous Universal Approximation Theorems for PQCs, which are either nonconstructive or rely on parameterized classical data processing, we explicitly construct data re-uploading PQCs for approximating multivariate polynomials and smooth functions. We establish the first non-asymptotic approximation error bounds for these functions in terms of the number of qubits, quantum circuit depth, and number of trainable parameters. Notably, we demonstrate that for approximating functions that satisfy specific smoothness criteria, the quantum circuit size and number of trainable parameters of our proposed PQCs can be smaller than those of deep ReLU neural networks. We further validate the approximation capability of PQCs through numerical experiments. Our results provide a theoretical foundation for designing practical PQCs and quantum neural networks for machine learning tasks that can be implemented on near-term quantum devices, paving the way for the advancement of quantum machine learning.
73.5CCApr 1
On the average-case complexity landscape for Tensor-Isomorphism-complete problems over finite fieldsTiange Li, Yinan Li, Youming Qiao et al.
In Grochow and Qiao (SIAM J. Comput., 2021), the complexity class Tensor Isomorphism (TI) was introduced and isomorphism problems for groups, algebras, and polynomials were shown to be TI-complete. In this paper, we study average-case algorithms for several TI-complete problems over finite fields, including algebra isomorphism, matrix code conjugacy, and $4$-tensor isomorphism. Our main results are as follows. Over the finite field of order $q$, we devise (1) average-case polynomial-time algorithms for algebra isomorphism and matrix code conjugacy that succeed in a $1/Î(q)$ fraction of inputs and (2) an average-case polynomial-time algorithm for the $4$-tensor isomorphism that succeeds in a $1/q^{Î(1)}$ fraction of inputs. Prior to our work, algorithms for algebra isomorphism with rigorous average-case analyses ran in exponential time, albeit succeeding on a larger fraction of inputs (Li--Qiao, FOCS'17; Brooksbank--Li--Qiao--Wilson, ESA'20; Grochow--Qiao--Tang, STACS'21). These results reveal a finer landscape of the average-case complexities of TI-complete problems, providing guidance for cryptographic systems based on isomorphism problems. Our main technical contribution is to introduce the spectral properties of random matrices into algorithms for TI-complete problems. This leads to not only new algorithms but also new questions in random matrix theory over finite fields. To settle these questions, we need to extend both the generating function approach as in Neumann and Praeger (J. London Math. Soc., 1998) and the characteristic sum method of Gorodetsky and Rodgers (Trans. Amer. Math. Soc., 2021).
42.9LGMay 11
$\varepsilon$-Good Action Identification in Fixed-Budget Monte Carlo Tree SearchYinan Li, Tuan Nguyen, Kwang-Sung Jun
We study the fixed-budget max-min action identification problem in depth-2 max-min trees, an important special case of Monte Carlo Tree Search. A learner sequentially allocates $T$ samples to leaves and then recommends a subtree whose minimum leaf value is largest. Motivated by approximate planning, we focus on $\varepsilon$-good subtree identification, where any subtree whose min value is within $\varepsilon$ of the optimal maximin value is acceptable. Our main contribution is an $\varepsilon$-agnostic algorithm: it does not require $\varepsilon$ as input, but achieves instance-dependent error bounds for every meaningful $\varepsilon$. We show that the misidentification probability decays as $\exp(-\widetildeΘ(T/H_2(\varepsilon)))$, where $H_2(\varepsilon)$ captures both cross-subtree and within-subtree gaps. When each subtree has a single leaf, the problem reduces to standard fixed-budget best-arm identification, and our analysis recovers, up to accelerating factors, known $\varepsilon$-good guarantees for halving-style methods while giving a new $\varepsilon$-good guarantee for Successive Rejects. On the lower-bound side, we provide complementary positive and negative results showing that max-min identification has a different hardness structure from standard $K$-armed bandits. To our knowledge, this is the first provable fixed-budget algorithmic guarantee for max-min action identification.
AIDec 21, 2023
HGE: Embedding Temporal Knowledge Graphs in a Product Space of Heterogeneous Geometric SubspacesJiaxin Pan, Mojtaba Nayyeri, Yinan Li et al.
Temporal knowledge graphs represent temporal facts $(s,p,o,τ)$ relating a subject $s$ and an object $o$ via a relation label $p$ at time $τ$, where $τ$ could be a time point or time interval. Temporal knowledge graphs may exhibit static temporal patterns at distinct points in time and dynamic temporal patterns between different timestamps. In order to learn a rich set of static and dynamic temporal patterns and apply them for inference, several embedding approaches have been suggested in the literature. However, as most of them resort to single underlying embedding spaces, their capability to model all kinds of temporal patterns was severely limited by having to adhere to the geometric property of their one embedding space. We lift this limitation by an embedding approach that maps temporal facts into a product space of several heterogeneous geometric subspaces with distinct geometric properties, i.e.\ Complex, Dual, and Split-complex spaces. In addition, we propose a temporal-geometric attention mechanism to integrate information from different geometric subspaces conveniently according to the captured relational and temporal information. Experimental results on standard temporal benchmark datasets favorably evaluate our approach against state-of-the-art models.
LGOct 23, 2023
Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization ApproachYinan Li, Chicheng Zhang
We study the problem of computationally and label efficient PAC active learning $d$-dimensional halfspaces with Tsybakov Noise~\citep{tsybakov2004optimal} under structured unlabeled data distributions. Inspired by~\cite{diakonikolas2020learning}, we prove that any approximate first-order stationary point of a smooth nonconvex loss function yields a halfspace with a low excess error guarantee. In light of the above structural result, we design a nonconvex optimization-based algorithm with a label complexity of $\tilde{O}(d (\frac{1}ε)^{\frac{8-6α}{3α-1}})$, under the assumption that the Tsybakov noise parameter $α\in (\frac13, 1]$, which narrows down the gap between the label complexities of the previously known efficient passive or active algorithms~\citep{diakonikolas2020polynomial,zhang2021improved} and the information-theoretic lower bound in this setting.
CVMay 30, 2025
Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-LangevinFangyikang Wang, Hubery Yin, Lei Qian et al.
The diffusion models (DMs) have demonstrated the remarkable capability of generating images via learning the noised score function of data distribution. Current DM sampling techniques typically rely on first-order Langevin dynamics at each noise level, with efforts concentrated on refining inter-level denoising strategies. While leveraging additional second-order Hessian geometry to enhance the sampling quality of Langevin is a common practice in Markov chain Monte Carlo (MCMC), the naive attempts to utilize Hessian geometry in high-dimensional DMs lead to quadratic-complexity computational costs, rendering them non-scalable. In this work, we introduce a novel Levenberg-Marquardt-Langevin (LML) method that approximates the diffusion Hessian geometry in a training-free manner, drawing inspiration from the celebrated Levenberg-Marquardt optimization algorithm. Our approach introduces two key innovations: (1) A low-rank approximation of the diffusion Hessian, leveraging the DMs' inherent structure and circumventing explicit quadratic-complexity computations; (2) A damping mechanism to stabilize the approximated Hessian. This LML approximated Hessian geometry enables the diffusion sampling to execute more accurate steps and improve the image generation quality. We further conduct a theoretical analysis to substantiate the approximation error bound of low-rank approximation and the convergence property of the damping mechanism. Extensive experiments across multiple pretrained DMs validate that the LML method significantly improves image generation quality, with negligible computational overhead.
LGMay 29, 2025
Efficiently Access Diffusion Fisher: Within the Outer Product Span SpaceFangyikang Wang, Hubery Yin, Shaobin Zhuang et al.
Recent Diffusion models (DMs) advancements have explored incorporating the second-order diffusion Fisher information (DF), defined as the negative Hessian of log density, into various downstream tasks and theoretical analysis. However, current practices typically approximate the diffusion Fisher by applying auto-differentiation to the learned score network. This black-box method, though straightforward, lacks any accuracy guarantee and is time-consuming. In this paper, we show that the diffusion Fisher actually resides within a space spanned by the outer products of score and initial data. Based on the outer-product structure, we develop two efficient approximation algorithms to access the trace and matrix-vector multiplication of DF, respectively. These algorithms bypass the auto-differentiation operations with time-efficient vector-product calculations. Furthermore, we establish the approximation error bounds for the proposed algorithms. Experiments in likelihood evaluation and adjoint optimization demonstrate the superior accuracy and reduced computational cost of our proposed algorithms. Additionally, based on the novel outer-product formulation of DF, we design the first numerical verification experiment for the optimal transport property of the general PF-ODE deduced map.
MLFeb 3
Fixed Budget is No Harder Than Fixed Confidence in Best-Arm Identification up to Logarithmic FactorsKapilan Balagopalan, Yinan Li, Yao Zhao et al.
The best-arm identification (BAI) problem is one of the most fundamental problems in interactive machine learning, which has two flavors: the fixed-budget setting (FB) and the fixed-confidence setting (FC). For $K$-armed bandits with the unique best arm, the optimal sample complexities for both settings have been settled down, and they match up to logarithmic factors. This prompts an interesting research question about the generic, potentially structured BAI problems: Is FB harder than FC or the other way around? In this paper, we show that FB is no harder than FC up to logarithmic factors. We do this constructively: we propose a novel algorithm called FC2FB (fixed confidence to fixed budget), which is a meta algorithm that takes in an FC algorithm $\mathcal{A}$ and turn it into an FB algorithm. We prove that this FC2FB enjoys a sample complexity that matches, up to logarithmic factors, that of the sample complexity of $\mathcal{A}$. This means that the optimal FC sample complexity is an upper bound of the optimal FB sample complexity up to logarithmic factors. Our result not only reveals a fundamental relationship between FB and FC, but also has a significant implication: FC2FB, combined with existing state-of-the-art FC algorithms, leads to improved sample complexity for a number of FB problems.
LGJul 16, 2025
Second-Order Bounds for [0,1]-Valued Regression via Betting LossYinan Li, Kwang-Sung Jun
We consider the $[0,1]$-valued regression problem in the i.i.d. setting. In a related problem called cost-sensitive classification, \citet{foster21efficient} have shown that the log loss minimizer achieves an improved generalization bound compared to that of the squared loss minimizer in the sense that the bound scales with the cost of the best classifier, which can be arbitrarily small depending on the problem at hand. Such a result is often called a first-order bound. For $[0,1]$-valued regression, we first show that the log loss minimizer leads to a similar first-order bound. We then ask if there exists a loss function that achieves a variance-dependent bound (also known as a second order bound), which is a strict improvement upon first-order bounds. We answer this question in the affirmative by proposing a novel loss function called the betting loss. Our result is ``variance-adaptive'' in the sense that the bound is attained \textit{without any knowledge about the variance}, which is in contrast to modeling label (or reward) variance or the label distribution itself explicitly as part of the function class such as distributional reinforcement learning.
QUANT-PHJun 18, 2024
Quantum Compiling with Reinforcement Learning on a Superconducting ProcessorZ. T. Wang, Qiuhao Chen, Yuxuan Du et al.
To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcement learning (RL)-based quantum compiler for a superconducting processor and demonstrate its capability of discovering novel and hardware-amenable circuits with short lengths. We show that for the three-qubit quantum Fourier transformation, a compiled circuit using only seven CZ gates with unity circuit fidelity can be achieved. The compiler is also able to find optimal circuits under device topological constraints, with lengths considerably shorter than those by the conventional method. Our study exemplifies the codesign of the software with hardware for efficient quantum compilation, offering valuable insights for the advancement of RL-based compilers.
MLOct 16, 2021
Noise-Augmented Privacy-Preserving Empirical Risk Minimization with Dual-purpose Regularizer and Privacy Budget Retrieval and RecyclingYinan Li, Fang Liu
We propose Noise-Augmented Privacy-Preserving Empirical Risk Minimization (NAPP-ERM) that solves ERM with differential privacy guarantees. Existing privacy-preserving ERM approaches may be subject to over-regularization with the employment of an l2 term to achieve strong convexity on top of the target regularization. NAPP-ERM improves over the current approaches and mitigates over-regularization by iteratively realizing target regularization through appropriately designed augmented data and delivering strong convexity via a single adaptively weighted dual-purpose l2 regularizer. When the target regularization is for variable selection, we propose a new regularizer that achieves both privacy and sparsity guarantees simultaneously. Finally, we propose a strategy to retrieve privacy budget when the strong convexity requirement is met, which can be returned to users such that the DP of ERM is guaranteed at a lower privacy cost than originally planned, or be recycled to the ERM optimization procedure to reduce the injected DP noise and improve the utility of DP-ERM. From an implementation perspective, NAPP-ERM can be achieved by optimizing a non-perturbed object function given noise-augmented data and can thus leverage existing tools for non-private ERM optimization. We illustrate through extensive experiments the mitigation effect of the over-regularization and private budget retrieval by NAPP-ERM on variable selection and prediction.
LGFeb 10, 2021
Improved Algorithms for Efficient Active Learning Halfspaces with Massart and Tsybakov noiseChicheng Zhang, Yinan Li
We give a computationally-efficient PAC active learning algorithm for $d$-dimensional homogeneous halfspaces that can tolerate Massart noise (Massart and Nédélec, 2006) and Tsybakov noise (Tsybakov, 2004). Specialized to the $η$-Massart noise setting, our algorithm achieves an information-theoretically near-optimal label complexity of $\tilde{O}\left( \frac{d}{(1-2η)^2} \mathrm{polylog}(\frac1ε) \right)$ under a wide range of unlabeled data distributions (specifically, the family of "structured distributions" defined in Diakonikolas et al. (2020)). Under the more challenging Tsybakov noise condition, we identify two subfamilies of noise conditions, under which our efficient algorithm provides label complexity guarantees strictly lower than passive learning algorithms.
DBApr 22, 2020
Qd-tree: Learning Data Layouts for Big Data AnalyticsZongheng Yang, Badrish Chandramouli, Chi Wang et al.
Corporations today collect data at an unprecedented and accelerating scale, making the need to run queries on large datasets increasingly important. Technologies such as columnar block-based data organization and compression have become standard practice in most commercial database systems. However, the problem of best assigning records to data blocks on storage is still open. For example, today's systems usually partition data by arrival time into row groups, or range/hash partition the data based on selected fields. For a given workload, however, such techniques are unable to optimize for the important metric of the number of blocks accessed by a query. This metric directly relates to the I/O cost, and therefore performance, of most analytical queries. Further, they are unable to exploit additional available storage to drive this metric down further. In this paper, we propose a new framework called a query-data routing tree, or qd-tree, to address this problem, and propose two algorithms for their construction based on greedy and deep reinforcement learning techniques. Experiments over benchmark and real workloads show that a qd-tree can provide physical speedups of more than an order of magnitude compared to current blocking schemes, and can reach within 2X of the lower bound for data skipping based on selectivity, while providing complete semantic descriptions of created blocks.
AIApr 2, 2020
Continuous Motion Planning with Temporal Logic Specifications using Deep Neural NetworksChuanzheng Wang, Yinan Li, Stephen L. Smith et al.
In this paper, we propose a model-free reinforcement learning method to synthesize control policies for motion planning problems with continuous states and actions. The robot is modelled as a labeled discrete-time Markov decision process (MDP) with continuous state and action spaces. Linear temporal logics (LTL) are used to specify high-level tasks. We then train deep neural networks to approximate the value function and policy using an actor-critic reinforcement learning method. The LTL specification is converted into an annotated limit-deterministic Büchi automaton (LDBA) for continuously shaping the reward so that dense rewards are available during training. A naïve way of solving a motion planning problem with LTL specifications using reinforcement learning is to sample a trajectory and then assign a high reward for training if the trajectory satisfies the entire LTL formula. However, the sampling complexity needed to find such a trajectory is too high when we have a complex LTL formula for continuous state and action spaces. As a result, it is very unlikely that we get enough reward for training if all sample trajectories start from the initial state in the automata. In this paper, we propose a method that samples not only an initial state from the state space, but also an arbitrary state in the automata at the beginning of each training episode. We test our algorithm in simulation using a car-like robot and find out that our method can learn policies for different working configurations and LTL specifications successfully.
DBMay 21, 2019
ALEX: An Updatable Adaptive Learned IndexJialin Ding, Umar Farooq Minhas, Jia Yu et al.
Recent work on "learned indexes" has changed the way we look at the decades-old field of DBMS indexing. The key idea is that indexes can be thought of as "models" that predict the position of a key in a dataset. Indexes can, thus, be learned. The original work by Kraska et al. shows that a learned index beats a B+Tree by a factor of up to three in search time and by an order of magnitude in memory footprint. However, it is limited to static, read-only workloads. In this paper, we present a new learned index called ALEX which addresses practical issues that arise when implementing learned indexes for workloads that contain a mix of point lookups, short range queries, inserts, updates, and deletes. ALEX effectively combines the core insights from learned indexes with proven storage and indexing techniques to achieve high performance and low memory footprint. On read-only workloads, ALEX beats the learned index from Kraska et al. by up to 2.2X on performance with up to 15X smaller index size. Across the spectrum of read-write workloads, ALEX beats B+Trees by up to 4.1X while never performing worse, with up to 2000X smaller index size. We believe ALEX presents a key step towards making learned indexes practical for a broader class of database workloads with dynamic updates.
RONov 11, 2018
Reactive Task and Motion Planning for Robust Whole-Body Dynamic Locomotion in Constrained EnvironmentsYe Zhao, Yinan Li, Luis Sentis et al.
Contact-based decision and planning methods are becoming increasingly important to endow higher levels of autonomy for legged robots. Formal synthesis methods derived from symbolic systems have great potential for reasoning about high-level locomotion decisions and achieving complex maneuvering behaviors with correctness guarantees. This study takes a first step toward formally devising an architecture composed of task planning and control of whole-body dynamic locomotion behaviors in constrained and dynamically changing environments. At the high level, we formulate a two-player temporal logic game between the multi-limb locomotion planner and its dynamic environment to synthesize a winning strategy that delivers symbolic locomotion actions. These locomotion actions satisfy the desired high-level task specifications expressed in a fragment of temporal logic. Those actions are sent to a robust finite transition system that synthesizes a locomotion controller that fulfills state reachability constraints. This controller is further executed via a low-level motion planner that generates feasible locomotion trajectories. We construct a set of dynamic locomotion models for legged robots to serve as a template library for handling diverse environmental events. We devise a replanning strategy that takes into consideration sudden environmental changes or large state disturbances to increase the robustness of the resulting locomotion behaviors. We formally prove the correctness of the layered locomotion framework guaranteeing a robust implementation by the motion planning layer. Simulations of reactive locomotion behaviors in diverse environments indicate that our framework has the potential to serve as a theoretical foundation for intelligent locomotion behaviors.
MLOct 11, 2018
PANDA: AdaPtive Noisy Data Augmentation for Regularization of Undirected Graphical ModelsYinan Li, Xiao Liu, Fang Liu
We propose an AdaPtive Noise Augmentation (PANDA) technique to regularize the estimation and construction of undirected graphical models. PANDA iteratively optimizes the objective function given the noise augmented data until convergence to achieve regularization on model parameters. The augmented noises can be designed to achieve various regularization effects on graph estimation, such as the bridge (including lasso and ridge), elastic net, adaptive lasso, and SCAD penalization; it also realizes the group lasso and fused ridge. We examine the tail bound of the noise-augmented loss function and establish that the noise-augmented loss function and its minimizer converge almost surely to the expected penalized loss function and its minimizer, respectively. We derive the asymptotic distributions for the regularized parameters through PANDA in generalized linear models, based on which, inferences for the parameters can be obtained simultaneously with variable selection. We show the non-inferior performance of PANDA in constructing graphs of different types in simulation studies and apply PANDA to an autism spectrum disorder data to construct a mixed-node graph. We also show that the inferences based on the asymptotic distribution of regularized parameter estimates via PANDA achieve nominal or near-nominal coverage and are far more efficient, compared to some existing post-selection procedures. Computationally, PANDA can be easily programmed in software that implements (GLMs) without resorting to complicated optimization techniques.
MLDec 5, 2016
Whiteout: Gaussian Adaptive Noise Regularization in Deep Neural NetworksYinan Li, Fang Liu
Noise injection (NI) is an efficient technique to mitigate over-fitting in neural networks (NNs). The Bernoulli NI procedure as implemented in dropout and shakeout has connections with $l_1$ and $l_2$ regularization for the NN model parameters. We propose whiteout, a family NI regularization techniques (NIRT) through injecting adaptive Gaussian noises during the training of NNs. Whiteout is the first NIRT than imposes a broad range of the $l_γ$ sparsity regularization $(γ\in(0,2))$ without having to involving the $l_2$ regularization. Whiteout can also be extended to offer regularizations similar to the adaptive lasso and group lasso. We establish the regularization effect of whiteout in the framework of generalized linear models with closed-form penalty terms and show that whiteout stabilizes the training of NNs with decreased sensitivity to small perturbations in the input. We establish that the noise-perturbed empirical loss function (pelf) with whiteout converges almost surely to the ideal loss function (ilf), and the minimizer of the pelf is consistent for the minimizer of the ilf. We derive the tail bound on the pelf to establish the practical feasibility in its minimization. The superiority of whiteout over the Bernoulli NIRTs, dropout and shakeout, in learning NNs with relatively small-sized training sets and non-inferiority in large-sized training sets is demonstrated in both simulated and real-life data sets. This work represents the first in-depth theoretical, methodological, and practical examination of the regularization effects of both additive and multiplicative Gaussian NI in deep NNs.
OCAug 30, 2016
Invariance Control Synthesis for Switched Systems: An Interval Analysis ApproachYinan Li, Jun Liu
This paper focuses on the invariance control problem for discrete-time switched nonlinear systems. The proposed approach computes controlled invariant sets in a finite number of iterations and directly yields a partition-based invariance controller using the information recorded during the computation. In contrast with Lyapunov-based control methods, this method does not require the subsystems to have common equilibrium points. Algorithms are developed for computing both outer and inner approximations of the maximal controlled invariant sets, which are represented as finite unions of intervals. The general convergence results of interval methods allow us to obtain arbitrarily precise approximations without any stability assumptions. In addition, invariant inner approximations can be computed provided that the switched system satisfies a robustly controlled invariance condition. Under the same condition, we also prove the existence of an invariance controller based on partitions of the state space. Our method is illustrated with three examples drawn from different applications and compared with existing work in the literature.
SYJul 22, 2015
Computing finite abstractions with robustness margins via local reachable set over-approximationYinan Li, Jun Liu, Necmiye Ozay
This paper proposes a method to compute finite abstractions that can be used for synthesizing robust hybrid control strategies for nonlinear systems. Most existing methods for computing finite abstractions utilize some global, analytical function to provide bounds on the reachable sets of nonlinear systems, which can be conservative and lead to spurious transitions in the abstract systems. This problem is even more pronounced in the presence of imperfect measurements and modelling uncertainties, where control synthesis can easily become infeasible due to added spurious transitions. To mitigate this problem, we propose to compute finite abstractions with robustness margins by over-approximating the local reachable sets of nonlinear systems. We do so by linearizing the nonlinear dynamics into linear affine systems and keeping track of the linearization error. It is shown that this approach provides tighter approximations and several numerical examples are used to illustrate of effectiveness of the proposed methods.