17.5LGMay 8
When Symbol Names Should Not Matter: A Logistic Theory of Fresh-Symbol ClassificationWenjie Guan, Jelena Bradic
Template tasks have emerged as a clean testbed for asking whether transformers reason with abstract symbols rather than concrete token names. We study the fixed-label classification version of this problem, where train and test examples share latent templates but may use disjoint vocabularies. Unlike next-token prediction, the model need not emit unseen symbols; it must learn a decision rule invariant to symbol renaming. We analyze regularized kernel logistic classification in the transformer-kernel regime. Our main result decomposes the learned predictor into an ideal template-level classifier and a finite-sample perturbation caused by accidental token overlaps in the training data. We encode these overlaps by a colored collision graph and prove high-probability margin-transfer guarantees for fresh-symbol classification. This perspective extends template-based analyses to logistic classification and refines scalar diversity conditions: vocabulary size controls the average rate of collisions, but collision geometry controls whether the ideal classification margin is preserved. More broadly, the same perturbation framework applies to abstraction-augmented inputs, yielding a general margin-versus-collision criterion for identifying when prompting strategies improve fresh-symbol generalization. Synthetic template experiments illustrate the predicted roles of regularization, sample size, and transformer-kernel structure.
MLJul 11, 2024
Causal inference through multi-stage learning and doubly robust deep neural networksYuqian Zhang, Jelena Bradic
Deep neural networks (DNNs) have demonstrated remarkable empirical performance in large-scale supervised learning problems, particularly in scenarios where both the sample size $n$ and the dimension of covariates $p$ are large. This study delves into the application of DNNs across a wide spectrum of intricate causal inference tasks, where direct estimation falls short and necessitates multi-stage learning. Examples include estimating the conditional average treatment effect and dynamic treatment effect. In this framework, DNNs are constructed sequentially, with subsequent stages building upon preceding ones. To mitigate the impact of estimation errors from early stages on subsequent ones, we integrate DNNs in a doubly robust manner. In contrast to previous research, our study offers theoretical assurances regarding the effectiveness of DNNs in settings where the dimensionality $p$ expands with the sample size. These findings are significant independently and extend to degenerate single-stage learning problems.
MLFeb 17, 2024
Adaptive Split Balancing for Optimal Random ForestYuqian Zhang, Weijie Ji, Jelena Bradic
In this paper, we propose a new random forest algorithm that constructs the trees using a novel adaptive split-balancing method. Rather than relying on the widely-used random feature selection, we propose a permutation-based balanced splitting criterion. The adaptive split balancing forest (ASBF), achieves minimax optimality under the Lipschitz class. Its localized version, which fits local regressions at the leaf level, attains the minimax rate under the broad Hölder class $\mathcal{H}^{q,β}$ of problems for any $q\in\mathbb{N}$ and $β\in(0,1]$. We identify that over-reliance on auxiliary randomness in tree construction may compromise the approximation power of trees, leading to suboptimal results. Conversely, the proposed less random, permutation-based approach demonstrates optimality over a wide range of models. Although random forests are known to perform well empirically, their theoretical convergence rates are slow. Simplified versions that construct trees without data dependence offer faster rates but lack adaptability during tree growth. Our proposed method achieves optimality in simple, smooth scenarios while adaptively learning the tree structure from the data. Additionally, we establish uniform upper bounds and demonstrate that ASBF improves dimensionality dependence in average treatment effect estimation problems. Simulation studies and real-world applications demonstrate our methods' superior performance over existing random forests.
MENov 12, 2021
Dynamic treatment effects: high-dimensional inference under model misspecificationYuqian Zhang, Weijie Ji, Jelena Bradic
Estimating dynamic treatment effects is crucial across various disciplines, providing insights into the time-dependent causal impact of interventions. However, this estimation poses challenges due to time-varying confounding, leading to potentially biased estimates. Furthermore, accurately specifying the growing number of treatment assignments and outcome models with multiple exposures appears increasingly challenging to accomplish. Double robustness, which permits model misspecification, holds great value in addressing these challenges. This paper introduces a novel "sequential model doubly robust" estimator. We develop novel moment-targeting estimates to account for confounding effects and establish that root-$N$ inference can be achieved as long as at least one nuisance model is correctly specified at each exposure time, despite the presence of high-dimensional covariates. Although the nuisance estimates themselves do not achieve root-$N$ rates, the carefully designed loss functions in our framework ensure final root-$N$ inference for the causal parameter of interest. Unlike off-the-shelf high-dimensional methods, which fail to deliver robust inference under model misspecification even within the doubly robust framework, our newly developed loss functions address this limitation effectively.
MEOct 10, 2021
High-dimensional Inference for Dynamic Treatment EffectsJelena Bradic, Weijie Ji, Yuqian Zhang
Estimating dynamic treatment effects is a crucial endeavor in causal inference, particularly when confronted with high-dimensional confounders. Doubly robust (DR) approaches have emerged as promising tools for estimating treatment effects due to their flexibility. However, we showcase that the traditional DR approaches that only focus on the DR representation of the expected outcomes may fall short of delivering optimal results. In this paper, we propose a novel DR representation for intermediate conditional outcome models that leads to superior robustness guarantees. The proposed method achieves consistency even with high-dimensional confounders, as long as at least one nuisance function is appropriately parametrized for each exposure time and treatment path. Our results represent a significant step forward as they provide new robustness guarantees. The key to achieving these results is our new DR representation, which offers superior inferential performance while requiring weaker assumptions. Lastly, we confirm our findings in practice through simulations and a real data application.
MEApr 14, 2021
Double Robust Semi-Supervised Inference for the Mean: Selection Bias under MAR Labeling with Decaying OverlapYuqian Zhang, Abhishek Chakrabortty, Jelena Bradic
Semi-supervised (SS) inference has received much attention in recent years. Apart from a moderate-sized labeled data, L, the SS setting is characterized by an additional, much larger sized, unlabeled data, U. The setting of |U| >> |L|, makes SS inference unique and different from the standard missing data problems, owing to natural violation of the so-called "positivity" or "overlap" assumption. However, most of the SS literature implicitly assumes L and U to be equally distributed, i.e., no selection bias in the labeling. Inferential challenges in missing at random (MAR) type labeling allowing for selection bias, are inevitably exacerbated by the decaying nature of the propensity score (PS). We address this gap for a prototype problem, the estimation of the response's mean. We propose a double robust SS (DRSS) mean estimator and give a complete characterization of its asymptotic properties. The proposed estimator is consistent as long as either the outcome or the PS model is correctly specified. When both models are correctly specified, we provide inference results with a non-standard consistency rate that depends on the smaller size |L|. The results are also extended to causal inference with imbalanced treatment groups. Further, we provide several novel choices of models and estimators of the decaying PS, including a novel offset logistic model and a stratified labeling model. We present their properties under both high and low dimensional settings. These may be of independent interest. Lastly, we present extensive simulations and also a real data application.
MLMar 21, 2021
Comments on Leo Breiman's paper 'Statistical Modeling: The Two Cultures' (Statistical Science, 2001, 16(3), 199-231)Jelena Bradic, Yinchu Zhu
Breiman challenged statisticians to think more broadly, to step into the unknown, model-free learning world, with him paving the way forward. Statistics community responded with slight optimism, some skepticism, and plenty of disbelief. Today, we are at the same crossroad anew. Faced with the enormous practical success of model-free, deep, and machine learning, we are naturally inclined to think that everything is resolved. A new frontier has emerged; the one where the role, impact, or stability of the {\it learning} algorithms is no longer measured by prediction quality, but an inferential one -- asking the questions of {\it why} and {\it if} can no longer be safely ignored.
EMMar 1, 2021
Dynamic covariate balancing: estimating treatment effects over time with potential local projectionsDavide Viviano, Jelena Bradic
This paper studies the estimation and inference of treatment effects in panel data settings when treatments change dynamically over time. We propose a balancing method that allows for (i) treatments to be assigned dynamically over time based on high-dimensional covariates, past outcomes, and treatments; (ii) outcomes and time-varying covariates to depend on the trajectory of all past treatments; (iii) heterogeneity of treatment effects. Our approach recursively projects potential outcomes' expectations on past histories. It then controls the bias arising from the non-experimental and sequential nature of this setting by balancing dynamically observable characteristics over time. We establish inferential guarantees of the proposed method even when the number of observable characteristics significantly exceeds the sample size. We study numerical properties of the estimator and illustrate the benefits of the procedure in an empirical application.
LGFeb 1, 2021
Learning to Combat Noisy Labels via Classification MarginsJason Z. Lin, Jelena Bradic
A deep neural network trained on noisy labels is known to quickly lose its power to discriminate clean instances from noisy ones. After the early learning phase has ended, the network memorizes the noisy instances, which leads to a significant degradation in its generalization performance. To resolve this issue, we propose MARVEL (MARgins Via Early Learning), a new robust learning method where the memorization of the noisy instances is curbed. We propose a new test statistic that tracks the goodness of "fit" of every instance based on the epoch-history of its classification margins. If its classification margin is small in a sequence of consecutive learning epochs, that instance is declared noisy and the network abandons learning on it. Consequently, the network first flags a possibly noisy instance, and then waits to see if learning on that instance can be improved and if not, the network learns with confidence that this instance can be safely abandoned. We also propose MARVEL+, where arduous instances can be upweighted, enabling the network to focus and improve its learning on them and consequently its generalization. Experimental results on benchmark datasets with synthetic label noise and real-world datasets show that MARVEL outperforms other baselines consistently across different noise levels, with a significantly larger margin under asymmetric noise.
MLJul 26, 2020
DeepHazard: neural network for time-varying risksDenise Rava, Jelena Bradic
Prognostic models in survival analysis are aimed at understanding the relationship between patients' covariates and the distribution of survival time. Traditionally, semi-parametric models, such as the Cox model, have been assumed. These often rely on strong proportionality assumptions of the hazard that might be violated in practice. Moreover, they do not often include covariate information updated over time. We propose a new flexible method for survival prediction: DeepHazard, a neural network for time-varying risks. Our approach is tailored for a wide range of continuous hazards forms, with the only restriction of being additive in time. A flexible implementation, allowing different optimization methods, along with any norm penalty, is developed. Numerical examples illustrate that our approach outperforms existing state-of-the-art methodology in terms of predictive capability evaluated through the C-index metric. The same is revealed on the popular real datasets as METABRIC, GBSG, and ACTG.
STJun 12, 2020
Detangling robustness in high dimensions: composite versus model-averaged estimationJing Zhou, Gerda Claeskens, Jelena Bradic
Robust methods, though ubiquitous in practice, are yet to be fully understood in the context of regularized estimation and high dimensions. Even simple questions become challenging very quickly. For example, classical statistical theory identifies equivalence between model-averaged and composite quantile estimation. However, little to nothing is known about such equivalence between methods that encourage sparsity. This paper provides a toolbox to further study robustness in these settings and focuses on prediction. In particular, we study optimally weighted model-averaged as well as composite $l_1$-regularized estimation. Optimal weights are determined by minimizing the asymptotic mean squared error. This approach incorporates the effects of regularization, without the assumption of perfect selection, as is often used in practice. Such weights are then optimal for prediction quality. Through an extensive simulation study, we show that no single method systematically outperforms others. We find, however, that model-averaged and composite quantile estimators often outperform least-squares methods, even in the case of Gaussian model noise. Real data application witnesses the method's practical use through the reconstruction of compressed audio signals.
EMMay 25, 2020
Fair Policy TargetingDavide Viviano, Jelena Bradic
One of the major concerns of targeting interventions on individuals in social welfare programs is discrimination: individualized treatments may induce disparities across sensitive attributes such as age, gender, or race. This paper addresses the question of the design of fair and efficient treatment allocation rules. We adopt the non-maleficence perspective of first do no harm: we select the fairest allocation within the Pareto frontier. We cast the optimization into a mixed-integer linear program formulation, which can be solved using off-the-shelf algorithms. We derive regret bounds on the unfairness of the estimated policy function and small sample guarantees on the Pareto frontier under general notions of fairness. Finally, we illustrate our method using an application from education economics.
MLJan 8, 2020
Censored Quantile Regression ForestAlexander Hanbo Li, Jelena Bradic
Random forests are powerful non-parametric regression method but are severely limited in their usage in the presence of randomly censored observations, and naively applied can exhibit poor predictive performance due to the incurred biases. Based on a local adaptive representation of random forests, we develop its regression adjustment for randomly censored regression quantile models. Regression adjustment is based on a new estimating equation that adapts to censoring and leads to quantile score whenever the data do not exhibit censoring. The proposed procedure named {\it censored quantile regression forest}, allows us to estimate quantiles of time-to-event without any parametric modeling assumption. We establish its consistency under mild model specifications. Numerical studies showcase a clear advantage of the proposed procedure.
STDec 27, 2019
Minimax Semiparametric Learning With Approximate SparsityJelena Bradic, Victor Chernozhukov, Whitney K. Newey et al.
Estimating linear, mean-square continuous functionals is a pivotal challenge in statistics. In high-dimensional contexts, this estimation is often performed under the assumption of exact model sparsity, meaning that only a small number of parameters are precisely non-zero. This excludes models where linear formulations only approximate the underlying data distribution, such as nonparametric regression methods that use basis expansion such as splines, kernel methods or polynomial regressions. Many recent methods for root-$n$ estimation have been proposed, but the implications of exact model sparsity remain largely unexplored. In particular, minimax optimality for models that are not exactly sparse has not yet been developed. This paper formalizes the concept of approximate sparsity through classical semi-parametric theory. We derive minimax rates under this formulation for a regression slope and an average derivative, finding these bounds to be substantially larger than those in low-dimensional, semi-parametric settings. We identify several new phenomena. We discover new regimes where rate double robustness does not hold, yet root-$n$ estimation is still possible. In these settings, we propose an estimator that achieves minimax optimal rates. Our findings further reveal distinct optimality boundaries for ordered versus unordered nonparametric regression estimation.
MEJun 29, 2019
Estimating Treatment Effect under Additive Hazards Models with High-dimensional CovariatesJue Hou, Jelena Bradic, Ronghui Xu
Estimating causal effects for survival outcomes in the high-dimensional setting is an extremely important topic for many biomedical applications as well as areas of social sciences. We propose a new orthogonal score method for treatment effect estimation and inference that results in asymptotically valid confidence intervals assuming only good estimation properties of the hazard outcome model and the conditional probability of treatment. This guarantee allows us to provide valid inference for the conditional treatment effect under the high-dimensional additive hazards model under considerably more generality than existing approaches. In addition, we develop a new Hazards Difference (HDi), estimator. We showcase that our approach has double-robustness properties in high dimensions: with cross-fitting, the HDi estimate is consistent under a wide variety of treatment assignment models; the HDi estimate is also consistent when the hazards model is misspecified and instead the true data generating mechanism follows a partially linear additive hazards model. We further develop a novel sparsity doubly robust result, where either the outcome or the treatment model can be a fully dense high-dimensional model. We apply our methods to study the treatment effect of radical prostatectomy versus conservative management for prostate cancer patients using the SEER-Medicare Linked Data.
MEApr 2, 2019
Synthetic learner: model-free inference on treatments over timeDavide Viviano, Jelena Bradic
Understanding the effect of a particular treatment or a policy pertains to many areas of interest, ranging from political economics, marketing to healthcare. In this paper, we develop a non-parametric algorithm for detecting the effects of treatment over time in the context of Synthetic Controls. The method builds on counterfactual predictions from many algorithms without necessarily assuming that the algorithms correctly capture the model. We introduce an inferential procedure for detecting treatment effects and show that the testing procedure is asymptotically valid for stationary, beta mixing processes without imposing any restriction on the set of base algorithms under consideration. We discuss consistency guarantees for average treatment effect estimates and derive regret bounds for the proposed methodology. The class of algorithms may include Random Forest, Lasso, or any other machine-learning estimator. Numerical studies and an application illustrate the advantages of the method.
MLFeb 8, 2019
Censored Quantile Regression ForestsAlexander Hanbo Li, Jelena Bradic
Random forests are powerful non-parametric regression method but are severely limited in their usage in the presence of randomly censored observations, and naively applied can exhibit poor predictive performance due to the incurred biases. Based on a local adaptive representation of random forests, we develop its regression adjustment for randomly censored regression quantile models. Regression adjustment is based on new estimating equations that adapt to censoring and lead to quantile score whenever the data do not exhibit censoring. The proposed procedure named censored quantile regression forest, allows us to estimate quantiles of time-to-event without any parametric modeling assumption. We establish its consistency under mild model specifications. Numerical studies showcase a clear advantage of the proposed procedure.
MEFeb 2, 2019
High-dimensional semi-supervised learning: in search for optimal inference of the meanYuqian Zhang, Jelena Bradic
We provide a high-dimensional semi-supervised inference framework focused on the mean and variance of the response. Our data are comprised of an extensive set of observations regarding the covariate vectors and a much smaller set of labeled observations where we observe both the response as well as the covariates. We allow the size of the covariates to be much larger than the sample size and impose weak conditions on a statistical form of the data. We provide new estimators of the mean and variance of the response that extend some of the recent results presented in low-dimensional models. In particular, at times we will not necessitate consistent estimation of the functional form of the data. Together with estimation of the population mean and variance, we provide their asymptotic distribution and confidence intervals where we showcase gains in efficiency compared to the sample mean and variance. Our procedure, with minor modifications, is then presented to make important contributions regarding inference about average treatment effects. We also investigate the robustness of estimation and coverage and showcase widespread applicability and generality of the proposed method.
STFeb 26, 2018
Testability of high-dimensional linear models with non-sparse structuresJelena Bradic, Jianqing Fan, Yinchu Zhu
Understanding statistical inference under possibly non-sparse high-dimensional models has gained much interest recently. For a given component of the regression coefficient, we show that the difficulty of the problem depends on the sparsity of the corresponding row of the precision matrix of the covariates, not the sparsity of the regression coefficients. We develop new concepts of uniform and essentially uniform non-testability that allow the study of limitations of tests across a broad set of alternatives. Uniform non-testability identifies a collection of alternatives such that the power of any test, against any alternative in the group, is asymptotically at most equal to the nominal size. Implications of the new constructions include new minimax testability results that, in sharp contrast to the current results, do not depend on the sparsity of the regression parameters. We identify new tradeoffs between testability and feature correlation. In particular, we show that, in models with weak feature correlations, minimax lower bound can be attained by a test whose power has the $\sqrt{n}$ rate, regardless of the size of the model sparsity.
MEAug 14, 2017
Fixed effects testing in high-dimensional linear mixed modelsJelena Bradic, Gerda Claeskens, Thomas Gueuning
Many scientific and engineering challenges -- ranging from pharmacokinetic drug dosage allocation and personalized medicine to marketing mix (4Ps) recommendations -- require an understanding of the unobserved heterogeneity in order to develop the best decision making-processes. In this paper, we develop a hypothesis test and the corresponding p-value for testing for the significance of the homogeneous structure in linear mixed models. A robust matching moment construction is used for creating a test that adapts to the size of the model sparsity. When unobserved heterogeneity at a cluster level is constant, we show that our test is both consistent and unbiased even when the dimension of the model is extremely high. Our theoretical results rely on a new family of adaptive sparse estimators of the fixed effects that do not require consistent estimation of the random effects. Moreover, our inference results do not require consistent model selection. We showcase that moment matching can be extended to nonlinear mixed effects models and to generalized linear mixed effects models. In numerical and real data experiments, we find that the developed method is extremely accurate, that it adapts to the size of the underlying model and is decidedly powerful in the presence of irrelevant covariates.
MEAug 1, 2017
Breaking the curse of dimensionality in regressionYinchu Zhu, Jelena Bradic
Models with many signals, high-dimensional models, often impose structures on the signal strengths. The common assumption is that only a few signals are strong and most of the signals are zero or close (collectively) to zero. However, such a requirement might not be valid in many real-life applications. In this article, we are interested in conducting large-scale inference in models that might have signals of mixed strengths. The key challenge is that the signals that are not under testing might be collectively non-negligible (although individually small) and cannot be accurately learned. This article develops a new class of tests that arise from a moment matching formulation. A virtue of these moment-matching statistics is their ability to borrow strength across features, adapt to the sparsity size and exert adjustment for testing growing number of hypothesis. GRoup-level Inference of Parameter, GRIP, test harvests effective sparsity structures with hypothesis formulation for an efficient multiple testing procedure. Simulated data showcase that GRIPs error control is far better than the alternative methods. We develop a minimax theory, demonstrating optimality of GRIP for a broad range of models, including those where the model is a mixture of a sparse and high-dimensional dense signals.
MEJul 29, 2017
Fine-Gray competing risks model with high-dimensional covariates: estimation and InferenceJue Hou, Jelena Bradic, Ronghui Xu
The purpose of this paper is to construct confidence intervals for the regression coefficients in the Fine-Gray model for competing risks data with random censoring, where the number of covariates can be larger than the sample size. Despite strong motivation from biomedical applications, a high-dimensional Fine-Gray model has attracted relatively little attention among the methodological or theoretical literature. We fill in this gap by developing confidence intervals based on a one-step bias-correction for a regularized estimation. We develop a theoretical framework for the partial likelihood, which does not have independent and identically distributed entries and therefore presents many technical challenges. We also study the approximation error from the weighting scheme under random censoring for competing risks and establish new concentration results for time-dependent processes. In addition to the theoretical results and algorithms, we present extensive numerical experiments and an application to a study of non-cancer mortality among prostate cancer patients using the linked Medicare-SEER data.
MEMay 6, 2017
Comments on `High-dimensional simultaneous inference with the bootstrap'Jelena Bradic, Yinchu Zhu
We provide comments on the article "High-dimensional simultaneous inference with the bootstrap" by Ruben Dezeure, Peter Buhlmann and Cun-Hui Zhang.
MEMay 2, 2017
A projection pursuit framework for testing general high-dimensional hypothesisYinchu Zhu, Jelena Bradic
This article develops a framework for testing general hypothesis in high-dimensional models where the number of variables may far exceed the number of observations. Existing literature has considered less than a handful of hypotheses, such as testing individual coordinates of the model parameter. However, the problem of testing general and complex hypotheses remains widely open. We propose a new inference method developed around the hypothesis adaptive projection pursuit framework, which solves the testing problems in the most general case. The proposed inference is centered around a new class of estimators defined as $l_1$ projection of the initial guess of the unknown onto the space defined by the null. This projection automatically takes into account the structure of the null hypothesis and allows us to study formal inference for a number of long-standing problems. For example, we can directly conduct inference on the sparsity level of the model parameters and the minimum signal strength. This is especially significant given the fact that the former is a fundamental condition underlying most of the theoretical development in high-dimensional statistics, while the latter is a key condition used to establish variable selection properties. Moreover, the proposed method is asymptotically exact and has satisfactory power properties for testing very general functionals of the high-dimensional parameters. The simulation studies lend further support to our theoretical claims and additionally show excellent finite-sample size and power properties of the proposed test.
MLFeb 20, 2017
Uniform Inference for High-dimensional Quantile Regression: Linear Functionals and Regression Rank ScoresJelena Bradic, Mladen Kolar
Hypothesis tests in models whose dimension far exceeds the sample size can be formulated much like the classical studentized tests only after the initial bias of estimation is removed successfully. The theory of debiased estimators can be developed in the context of quantile regression models for a fixed quantile value. However, it is frequently desirable to formulate tests based on the quantile regression process, as this leads to more robust tests and more stable confidence sets. Additionally, inference in quantile regression requires estimation of the so called sparsity function, which depends on the unknown density of the error. In this paper we consider a debiasing approach for the uniform testing problem. We develop high-dimensional regression rank scores and show how to use them to estimate the sparsity function, as well as how to adapt them for inference involving the quantile regression process. Furthermore, we develop a Kolmogorov-Smirnov test in a location-shift high-dimensional models and confidence sets that are uniformly valid for many quantile values. The main technical result are the development of a Bahadur representation of the debiasing estimator that is uniform over a range of quantiles and uniform convergence of the quantile process to the Brownian bridge process, which are of independent interest. Simulation studies illustrate finite sample properties of our procedure.
STOct 14, 2016
Two-sample testing in non-sparse high-dimensional linear modelsYinchu Zhu, Jelena Bradic
In analyzing high-dimensional models, sparsity of the model parameter is a common but often undesirable assumption. In this paper, we study the following two-sample testing problem: given two samples generated by two high-dimensional linear models, we aim to test whether the regression coefficients of the two linear models are identical. We propose a framework named TIERS (short for TestIng Equality of Regression Slopes), which solves the two-sample testing problem without making any assumptions on the sparsity of the regression parameters. TIERS builds a new model by convolving the two samples in such a way that the original hypothesis translates into a new moment condition. A self-normalization construction is then developed to form a moment test. We provide rigorous theory for the developed framework. Under very weak conditions of the feature covariance, we show that the accuracy of the proposed test in controlling Type I errors is robust both to the lack of sparsity in the features and to the heavy tails in the error distribution, even when the sample size is much smaller than the feature dimension. Moreover, we discuss minimax optimality and efficiency properties of the proposed test. Simulation analysis demonstrates excellent finite-sample performance of our test. In deriving the test, we also develop tools that are of independent interest. The test is built upon a novel estimator, called Auto-aDaptive Dantzig Selector (ADDS), which not only automatically chooses an appropriate scale of the error term but also incorporates prior information. To effectively approximate the critical value of the test statistic, we develop a novel high-dimensional plug-in approach that complements the recent advances in Gaussian approximation theory.
MEOct 10, 2016
Linear Hypothesis Testing in Dense High-Dimensional Linear ModelsYinchu Zhu, Jelena Bradic
We propose a methodology for testing linear hypothesis in high-dimensional linear models. The proposed test does not impose any restriction on the size of the model, i.e. model sparsity or the loading vector representing the hypothesis. Providing asymptotically valid methods for testing general linear functions of the regression parameters in high-dimensions is extremely challenging -- especially without making restrictive or unverifiable assumptions on the number of non-zero elements. We propose to test the moment conditions related to the newly designed restructured regression, where the inputs are transformed and augmented features. These new features incorporate the structure of the null hypothesis directly. The test statistics are constructed in such a way that lack of sparsity in the original model parameter does not present a problem for the theoretical justification of our procedures. We establish asymptotically exact control on Type I error without imposing any sparsity assumptions on model parameter or the vector representing the linear hypothesis. Our method is also shown to achieve certain optimality in detecting deviations from the null hypothesis. We demonstrate the favorable finite-sample performance of the proposed methods, via a number of numerical and a real data example.
MEOct 7, 2016
Significance testing in non-sparse high-dimensional linear modelsYinchu Zhu, Jelena Bradic
In high-dimensional linear models, the sparsity assumption is typically made, stating that most of the parameters are equal to zero. Under the sparsity assumption, estimation and, recently, inference have been well studied. However, in practice, sparsity assumption is not checkable and more importantly is often violated; a large number of covariates might be expected to be associated with the response, indicating that possibly all, rather than just a few, parameters are non-zero. A natural example is a genome-wide gene expression profiling, where all genes are believed to affect a common disease marker. We show that existing inferential methods are sensitive to the sparsity assumption, and may, in turn, result in the severe lack of control of Type-I error. In this article, we propose a new inferential method, named CorrT, which is robust to model misspecification such as heteroscedasticity and lack of sparsity. CorrT is shown to have Type I error approaching the nominal level for \textit{any} models and Type II error approaching zero for sparse and many dense models. In fact, CorrT is also shown to be optimal in a variety of frameworks: sparse, non-sparse and hybrid models where sparse and dense signals are mixed. Numerical experiments show a favorable performance of the CorrT test compared to the state-of-the-art methods.
STSep 22, 2016
Robust Confidence Intervals in High-Dimensional Left-Censored RegressionJelena Bradic, Jiaqi Guo
This paper develops robust confidence intervals in high-dimensional and left-censored regression. Type-I censored regression models are extremely common in practice, where a competing event makes the variable of interest unobservable. However, techniques developed for entirely observed data do not directly apply to the censored observations. In this paper, we develop smoothed estimating equations that augment the de-biasing method, such that the resulting estimator is adaptive to censoring and is more robust to the misspecification of the error distribution. We propose a unified class of robust estimators, including Mallow's, Schweppe's and Hill-Ryan's one-step estimator. In the ultra-high-dimensional setting, where the dimensionality can grow exponentially with the sample size, we show that as long as the preliminary estimator converges faster than $n^{-1/4}$, the one-step estimator inherits asymptotic distribution of fully iterated version. Moreover, we show that the size of the residuals of the Bahadur representation matches those of the simple linear models, $s^{3/4 } (\log (p \vee n))^{3/4} / n^{1/4}$ -- that is, the effects of censoring asymptotically disappear. Simulation studies demonstrate that our method is adaptive to the censoring level and asymmetry in the error distribution, and does not lose efficiency when the errors are from symmetric distributions. Finally, we apply the developed method to a real data set from the MAQC-II repository that is related to the HIV-1 study.
MLOct 5, 2015
Boosting in the presence of outliers: adaptive classification with non-convex loss functionsAlexander Hanbo Li, Jelena Bradic
This paper examines the role and efficiency of the non-convex loss functions for binary classification problems. In particular, we investigate how to design a simple and effective boosting algorithm that is robust to the outliers in the data. The analysis of the role of a particular non-convex loss for prediction accuracy varies depending on the diminishing tail properties of the gradient of the loss -- the ability of the loss to efficiently adapt to the outlying data, the local convex properties of the loss and the proportion of the contaminated data. In order to use these properties efficiently, we propose a new family of non-convex losses named $γ$-robust losses. Moreover, we present a new boosting framework, {\it Arch Boost}, designed for augmenting the existing work such that its corresponding classification algorithm is significantly more adaptable to the unknown data contamination. Along with the Arch Boosting framework, the non-convex losses lead to the new class of boosting algorithms, named adaptive, robust, boosting (ARB). Furthermore, we present theoretical examples that demonstrate the robustness properties of the proposed algorithms. In particular, we develop a new breakdown point analysis and a new influence function analysis that demonstrate gains in robustness. Moreover, we present new theoretical results, based only on local curvatures, which may be used to establish statistical and optimization properties of the proposed Arch boosting algorithms with highly non-convex loss functions. Extensive numerical calculations are used to illustrate these theoretical properties and reveal advantages over the existing boosting methods when data exhibits a number of outliers.
STJul 31, 2015
Robustness in sparse linear models: relative efficiency based on robust approximate message passingJelena Bradic
Understanding efficiency in high dimensional linear models is a longstanding problem of interest. Classical work with smaller dimensional problems dating back to Huber and Bickel has illustrated the benefits of efficient loss functions. When the number of parameters $p$ is of the same order as the sample size $n$, $p \approx n$, an efficiency pattern different from the one of Huber was recently established. In this work, we consider the effects of model selection on the estimation efficiency of penalized methods. In particular, we explore whether sparsity, results in new efficiency patterns when $p > n$. In the interest of deriving the asymptotic mean squared error for regularized M-estimators, we use the powerful framework of approximate message passing. We propose a novel, robust and sparse approximate message passing algorithm (RAMP), that is adaptive to the error distribution. Our algorithm includes many non-quadratic and non-differentiable loss functions. We derive its asymptotic mean squared error and show its convergence, while allowing $p, n, s \to \infty$, with $n/p \in (0,1)$ and $n/s \in (1,\infty)$. We identify new patterns of relative efficiency regarding a number of penalized $M$ estimators, when $p$ is much larger than $n$. We show that the classical information bound is no longer reachable, even for light--tailed error distributions. We show that the penalized least absolute deviation estimator dominates the penalized least square estimator, in cases of heavy--tailed distributions. We observe this pattern for all choices of the number of non-zero parameters $s$, both $s \leq n$ and $s \approx n$. In non-penalized problems where $s =p \approx n$, the opposite regime holds. Therefore, we discover that the presence of model selection significantly changes the efficiency patterns.
STJun 14, 2013
Randomized maximum-contrast selection: subagging for large-scale regressionJelena Bradic
We introduce a very general method for sparse and large-scale variable selection. The large-scale regression settings is such that both the number of parameters and the number of samples are extremely large. The proposed method is based on careful combination of penalized estimators, each applied to a random projection of the sample space into a low-dimensional space. In one special case that we study in detail, the random projections are divided into non-overlapping blocks; each consisting of only a small portion of the original data. Within each block we select the projection yielding the smallest out-of-sample error. Our random ensemble estimator then aggregates the results according to new maximal-contrast voting scheme to determine the final selected set. Our theoretical results illuminate the effect on performance of increasing the number of non-overlapping blocks. Moreover, we demonstrate that statistical optimality is retained along with the computational speedup. The proposed method achieves minimax rates for approximate recovery over all estimators using the full set of samples. Furthermore, our theoretical results allow the number of subsamples to grow with the subsample size and do not require irrepresentable condition. The estimator is also compared empirically with several other popular high-dimensional estimators via an extensive simulation study, which reveals its excellent finite-sample performance.
STJul 18, 2012
Structured Estimation in Nonparameteric Cox ModelJelena Bradic, Rui Song
To better understand the interplay of censoring and sparsity we develop finite sample properties of nonparametric Cox proportional hazard's model. Due to high impact of sequencing data, carrying genetic information of each individual, we work with over-parametrized problem and propose general class of group penalties suitable for sparse structured variable selection and estimation. Novel non-asymptotic sandwich bounds for the partial likelihood are developed. We establish how they extend notion of local asymptotic normality (LAN) of Le Cam's. Such non-asymptotic LAN principles are further extended to high dimensional spaces where $p \gg n$. Finite sample prediction properties of penalized estimator in non-parametric Cox proportional hazards model, under suitable censoring conditions, agree with those of penalized estimator in linear models.