Venkatesh Saligrama

LG
h-index39
95papers
8,779citations
Novelty57%
AI Score62

95 Papers

19.5AIMay 29
PReMISE: Policy Rubrics as Measurement Specifications for LLM Judges

Swastik Roy, Rajkumar Pujari, Tharindu Kumarage et al.

LLM judges are increasingly used to evaluate open-ended responses, but their scores depend strongly on the rubrics that condition them. A vague rubric asking for a response to be ``helpful and factual'' can reward polished answers that invent facts or violate user intent. We treat reusable rubrics as measurement specifications: changing the rubric changes the response quality measurement induced by a fixed judge. We introduce PReMISE, a framework that, given pairwise human-preference data, (i) discovers a policy-level rubric set, and (ii) audits any rubric set under LLM-judge use along four axes: structural adequacy, reliability, preference fit, and adversarial robustness. Across rubric sources no raw source is simultaneously reliable, preference-predictive, and adversarially robust; and high inter-rater agreement does not imply low exploitability. PReMISE is the only rubric source to score non-trivially on applicability, specificity, and effective dimensionality simultaneously. We contribute two audit-targeted repair operations: preference-rank selection raises judge accuracy on paired responses from $65.0\%$ to $68.6\%$, competitive with the strongest rubric-discovery baselines and leading on two of three judges in our cross-judge sweep; reliability-constrained refinement reduces the rate at which exploit responses receive high scores from $46.4\%$ to $36.0\%$ with little change in inter-judge agreement ($α{=}.531\to.519$).

6.8CVNov 10, 2023Code
Learning Human Action Recognition Representations Without Real Humans

Howard Zhong, Samarth Mishra, Donghyun Kim et al.

Pre-training on massive video datasets has become essential to achieve high action recognition performance on smaller downstream datasets. However, most large-scale video datasets contain images of people and hence are accompanied with issues related to privacy, ethics, and data protection, often preventing them from being publicly shared for reproducible research. Existing work has attempted to alleviate these problems by blurring faces, downsampling videos, or training on synthetic data. On the other hand, analysis on the transferability of privacy-preserving pre-trained models to downstream tasks has been limited. In this work, we study this problem by first asking the question: can we pre-train models for human action recognition with data that does not include real humans? To this end, we present, for the first time, a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model. We then evaluate the transferability of the representation learned on this data to a diverse set of downstream action recognition benchmarks. Furthermore, we propose a novel pre-training strategy, called Privacy-Preserving MAE-Align, to effectively combine synthetic data and human-removed real data. Our approach outperforms previous baselines by up to 5% and closes the performance gap between human and no-human action recognition representations on downstream tasks, for both linear probing and fine-tuning. Our benchmark, code, and models are available at https://github.com/howardzh01/PPMA .

15.1LGApr 1, 2022
Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk

Tianrui Chen, Aditya Gangrade, Venkatesh Saligrama

We investigate a natural but surprisingly unstudied approach to the multi-armed bandit problem under safety risk constraints. Each arm is associated with an unknown law on safety risks and rewards, and the learner's goal is to maximise reward whilst not playing unsafe arms, as determined by a given threshold on the mean risk. We formulate a pseudo-regret for this setting that enforces this safety constraint in a per-round way by softly penalising any violation, regardless of the gain in reward due to the same. This has practical relevance to scenarios such as clinical trials, where one must maintain safety for each round rather than in an aggregated sense. We describe doubly optimistic strategies for this scenario, which maintain optimistic indices for both safety risk and reward. We show that schema based on both frequentist and Bayesian indices satisfy tight gap-dependent logarithmic regret bounds, and further that these play unsafe arms only logarithmically many times in total. This theoretical analysis is complemented by simulation studies demonstrating the effectiveness of the proposed schema, and probing the domains in which their use is appropriate.

15.1LGSep 27, 2022
Safe Linear Bandits over Unknown Polytopes

Aditya Gangrade, Tianrui Chen, Venkatesh Saligrama

The safe linear bandit problem (SLB) is an online approach to linear programming with unknown objective and unknown roundwise constraints, under stochastic bandit feedback of rewards and safety risks of actions. We study the tradeoffs between efficacy and smooth safety costs of SLBs over polytopes, and the role of aggressive doubly-optimistic play in avoiding the strong assumptions made by extant pessimistic-optimistic approaches. We first elucidate an inherent hardness in SLBs due the lack of knowledge of constraints: there exist `easy' instances, for which suboptimal extreme points have large `gaps', but on which SLB methods must still incur $Ω(\sqrt{T})$ regret or safety violations, due to an inability to resolve unknown optima to arbitrary precision. We then analyse a natural doubly-optimistic strategy for the safe linear bandit problem, DOSS, which uses optimistic estimates of both reward and safety risks to select actions, and show that despite the lack of knowledge of constraints or feasible points, DOSS simultaneously obtains tight instance-dependent $O(\log^2 T)$ bounds on efficacy regret, and $\tilde O(\sqrt{T})$ bounds on safety violations. Further, when safety is demanded to a finite precision, violations improve to $O(\log^2 T).$ These results rely on a novel dual analysis of linear bandits: we argue that \algoname proceeds by activating noisy versions of at least $d$ constraints in each round, which allows us to separately analyse rounds where a `poor' set of constraints is activated, and rounds where `good' sets of constraints are activated. The costs in the former are controlled to $O(\log^2 T)$ by developing new dual notions of gaps, based on global sensitivity analyses of linear programs, that quantify the suboptimality of each such set of constraints. The latter costs are controlled to $O(1)$ by explicitly analysing the solutions of optimistic play.

9.1CVSep 21, 2023
Learning to Drive Anywhere

Ruizhao Zhu, Peng Huang, Eshed Ohn-Bar et al.

Human drivers can seamlessly adapt their driving decisions across geographical locations with diverse conditions and rules of the road, e.g., left vs. right-hand traffic. In contrast, existing models for autonomous driving have been thus far only deployed within restricted operational domains, i.e., without accounting for varying driving behaviors across locations or model scalability. In this work, we propose AnyD, a single geographically-aware conditional imitation learning (CIL) model that can efficiently learn from heterogeneous and globally distributed data with dynamic environmental, traffic, and social characteristics. Our key insight is to introduce a high-capacity geo-location-based channel attention mechanism that effectively adapts to local nuances while also flexibly modeling similarities among regions in a data-driven manner. By optimizing a contrastive imitation objective, our proposed approach can efficiently scale across inherently imbalanced data distributions and location-dependent events. We demonstrate the benefits of our AnyD agent across multiple datasets, cities, and scalable deployment paradigms, i.e., centralized, semi-supervised, and distributed agent training. Specifically, AnyD outperforms CIL baselines by over 14% in open-loop evaluation and 30% in closed-loop testing on CARLA.

1.2NIMar 11, 2011
A Token Based Algorithm to Distributed Computation in Sensor Networks

Venkatesh Saligrama, Murat Alanyali

We consider distributed algorithms for data aggregation and function computation in sensor networks. The algorithms perform pairwise computations along edges of an underlying communication graph. A token is associated with each sensor node, which acts as a transmission permit. Nodes with active tokens have transmission permits; they generate messages at a constant rate and send each message to a randomly selected neighbor. By using different strategies to control the transmission permits we can obtain tradeoffs between message and time complexity. Gossip corresponds to the case when all nodes have permits all the time. We study algorithms where permits are revoked after transmission and restored upon reception. Examples of such algorithms include Simple-Random Walk(SRW), Coalescent-Random-Walk(CRW) and Controlled Flooding(CFLD) and their hybrid variants. SRW has a single node permit, which is passed on in the network. CRW, initially initially has a permit for each node but these permits are revoked gradually. The final result for SRW and CRW resides at a single(or few) random node(s) making a direct comparison with GOSSIP difficult. A hybrid two-phase algorithm switching from CRW to CFLD at a suitable pre-determined time can be employed to achieve consensus. We show that such hybrid variants achieve significant gains in both message and time complexity. The per-node message complexity for n-node graphs, such as 2D mesh, torii, and Random geometric graphs, scales as $O(polylog(n))$ and the corresponding time complexity scales as O(n). The reduced per-node message complexity leads to reduced energy utilization in sensor networks.

5.7CVApr 17, 2022
Learning Compositional Representations for Effective Low-Shot Generalization

Samarth Mishra, Pengkai Zhu, Venkatesh Saligrama

We propose Recognition as Part Composition (RPC), an image encoding approach inspired by human cognition. It is based on the cognitive theory that humans recognize complex objects by components, and that they build a small compact vocabulary of concepts to represent each instance with. RPC encodes images by first decomposing them into salient parts, and then encoding each part as a mixture of a small number of prototypes, each representing a certain concept. We find that this type of learning inspired by human cognition can overcome hurdles faced by deep convolutional networks in low-shot generalization tasks, like zero-shot learning, few-shot learning and unsupervised domain adaptation. Furthermore, we find a classifier using an RPC image encoder is fairly robust to adversarial attacks, that deep neural networks are known to be prone to. Given that our image encoding principle is based on human cognition, one would expect the encodings to be interpretable by humans, which we find to be the case via crowd-sourcing experiments. Finally, we propose an application of these interpretable encodings in the form of generating synthetic attribute annotations for evaluating zero-shot learning methods on new datasets.

5.3LGApr 30
Data Deletion Can Help in Adaptive RL

Param Budhraja, Aditya Gangrade, Alex Olshevsky et al.

Deploying reinforcement learning policies in the real world requires adapting to time-varying environments. We study this problem in the contextual Markov Decision Process (cMDP) framework, where a family of environments is indexed by a low-dimensional context unknown at test time. The standard approach decomposes the problem: train a so-called "universal policy" which assumes knowledge of the true context, then pair it with a context estimator which approximates context using the observed trajectory. We identify a simple, counterintuitive trick that substantially improves the estimator: randomly delete a fraction of the training buffer after each round. This works because data is collected across multiple rounds using progressively better policies, and older trajectories come from a different distribution than what the estimator will face at deployment time; random deletion creates an implicit exponential decay on older data while preserving diversity without requiring any explicit identification of which samples are stale. This reduces robustness gap by 30% for MLPs and by 6% on average for recurrent networks. Strikingly, it allows a narrow MLP with 5x fewer parameters to outperform a wide MLP trained without deletion. To understand when and why deletion helps, we analyze regularized empirical risk minimization with a mismatch between the train distribution and the distribution at deployment; in this idealized setting, we prove that removing a single uniformly random training point decreases expected test loss in expectation under mild conditions. For ridge regression we make this quantitative: deletion helps when the regularization coefficient is moderate and the signal-to-noise ratio (SNR) is sufficiently low, and, crucially, this SNR threshold gives a direct measure of how large the distribution mismatch between training and deployment must be for deletion to be beneficial.

1.4CVJul 14, 2022
Fine-grained Few-shot Recognition by Deep Object Parsing

Ruizhao Zhu, Pengkai Zhu, Samarth Mishra et al.

We propose a new method for fine-grained few-shot recognition via deep object parsing. In our framework, an object is made up of K distinct parts and for each part, we learn a dictionary of templates, which is shared across all instances and categories. An object is parsed by estimating the locations of these K parts and a set of active templates that can reconstruct the part features. We recognize test instances by comparing its active templates and the relative geometry of its part locations against those of the presented few-shot instances. Our method is end-to-end trainable to learn part templates on-top of a convolutional backbone. To combat visual distortions such as orientation, pose and size, we learn templates at multiple scales, and at test-time parse and match instances across these scales. We show that our method is competitive with the state-of-the-art, and by virtue of parsing enjoys interpretability as well.

1.8LGJul 7, 2022
FedHeN: Federated Learning in Heterogeneous Networks

Durmus Alp Emre Acar, Venkatesh Saligrama

We propose a novel training recipe for federated learning with heterogeneous networks where each device can have different architectures. We introduce training with a side objective to the devices of higher complexities to jointly train different architectures in a federated setting. We empirically show that our approach improves the performance of different architectures and leads to high communication savings compared to the state-of-the-art methods.

2.0LGFeb 1, 2023
Filtering Context Mitigates Scarcity and Selection Bias in Political Ideology Prediction

Chen Chen, Dylan Walker, Venkatesh Saligrama

We propose a novel supervised learning approach for political ideology prediction (PIP) that is capable of predicting out-of-distribution inputs. This problem is motivated by the fact that manual data-labeling is expensive, while self-reported labels are often scarce and exhibit significant selection bias. We propose a novel statistical model that decomposes the document embeddings into a linear superposition of two vectors; a latent neutral \emph{context} vector independent of ideology, and a latent \emph{position} vector aligned with ideology. We train an end-to-end model that has intermediate contextual and positional vectors as outputs. At deployment time, our model predicts labels for input documents by exclusively leveraging the predicted positional vectors. On two benchmark datasets we show that our model is capable of outputting predictions even when trained with as little as 5\% biased data, and is significantly more accurate than the state-of-the-art. Through crowd-sourcing we validate the neutrality of contextual vectors, and show that context filtering results in ideological concentration, allowing for prediction on out-of-distribution examples.

8.4CVDec 11, 2025
BabyVLM-V2: Toward Developmentally Grounded Pretraining and Benchmarking of Vision Foundation Models

Shengao Wang, Wenqi Wang, Zecheng Wang et al.

Early children's developmental trajectories set up a natural goal for sample-efficient pretraining of vision foundation models. We introduce BabyVLM-V2, a developmentally grounded framework for infant-inspired vision-language modeling that extensively improves upon BabyVLM-V1 through a longitudinal, multifaceted pretraining set, a versatile model, and, most importantly, DevCV Toolbox for cognitive evaluation. The pretraining set maximizes coverage while minimizing curation of a longitudinal, infant-centric audiovisual corpus, yielding video-utterance, image-utterance, and multi-turn conversational data that mirror infant experiences. DevCV Toolbox adapts all vision-related measures of the recently released NIH Baby Toolbox into a benchmark suite of ten multimodal tasks, covering spatial reasoning, memory, and vocabulary understanding aligned with early children's capabilities. Experimental results show that a compact model pretrained from scratch can achieve competitive performance on DevCV Toolbox, outperforming GPT-4o on some tasks. We hope the principled, unified BabyVLM-V2 framework will accelerate research in developmentally plausible pretraining of vision foundation models.

19.4AIMar 6
DeepFact: Co-Evolving Benchmarks and Agents for Deep Research Factuality

Yukun Huang, Leonardo F. R. Ribeiro, Momchil Hardalov et al.

Search-augmented LLM agents can produce deep research reports (DRRs), but verifying claim-level factuality remains challenging. Existing fact-checkers are primarily designed for general-domain, factoid-style atomic claims, and there is no benchmark to test whether such verifiers transfer to DRRs. Yet building such a benchmark is itself difficult. We first show that static expert-labeled benchmarks are brittle in this setting: in a controlled study with PhD-level specialists, unassisted experts achieve only 60.8% accuracy on a hidden micro-gold set of verifiable claims. We propose Evolving Benchmarking via Audit-then-Score (AtS), where benchmark labels and rationales are explicitly revisable: when a verifier disagrees with the current benchmark, it must submit evidence; an auditor adjudicates the dispute; and accepted revisions update the benchmark before models are scored. Across four AtS rounds, expert micro-gold accuracy rises to 90.9%, indicating experts are substantially more reliable as auditors than as one-shot labelers. We instantiate AtS as DeepFact-Bench, a versioned DRR factuality benchmark with auditable rationales, and DeepFact-Eval, a document-level verification agent (with a grouped lite variant) that outperforms existing verifiers on DeepFact-Bench and transfers well to external factuality datasets.

8.7CVMay 4, 2021Code
Effectively Leveraging Attributes for Visual Similarity

Samarth Mishra, Zhongping Zhang, Yuan Shen et al.

Measuring similarity between two images often requires performing complex reasoning along different axes (e.g., color, texture, or shape). Insights into what might be important for measuring similarity can can be provided by annotated attributes, but prior work tends to view these annotations as complete, resulting in them using a simplistic approach of predicting attributes on single images, which are, in turn, used to measure similarity. However, it is impractical for a dataset to fully annotate every attribute that may be important. Thus, only representing images based on these incomplete annotations may miss out on key information. To address this issue, we propose the Pairwise Attribute-informed similarity Network (PAN), which breaks similarity learning into capturing similarity conditions and relevance scores from a joint representation of two images. This enables our model to identify that two images contain the same attribute, but can have it deemed irrelevant (e.g., due to fine-grained differences between them) and ignored for measuring similarity between the two images. Notably, while prior methods of using attribute annotations are often unable to outperform prior art, PAN obtains a 4-9% improvement on compatibility prediction between clothing items on Polyvore Outfits, a 5% gain on few shot classification of images using Caltech-UCSD Birds (CUB), and over 1% boost to Recall@1 on In-Shop Clothes Retrieval. Implementation available at https://github.com/samarth4149/PAN

14.0CVJan 29, 2021Code
Surprisingly Simple Semi-Supervised Domain Adaptation with Pretraining and Consistency

Samarth Mishra, Kate Saenko, Venkatesh Saligrama

Most modern unsupervised domain adaptation (UDA) approaches are rooted in domain alignment, i.e., learning to align source and target features to learn a target domain classifier using source labels. In semi-supervised domain adaptation (SSDA), when the learner can access few target domain labels, prior approaches have followed UDA theory to use domain alignment for learning. We show that the case of SSDA is different and a good target classifier can be learned without needing alignment. We use self-supervised pretraining (via rotation prediction) and consistency regularization to achieve well separated target clusters, aiding in learning a low error target classifier. With our Pretraining and Consistency (PAC) approach, we achieve state of the art target accuracy on this semi-supervised domain adaptation task, surpassing multiple adversarial domain alignment methods, across multiple datasets. PAC, while using simple techniques, performs remarkably well on large and challenging SSDA benchmarks like DomainNet and Visda-17, often outperforming recent state of the art by sizeable margins. Code for our experiments can be found at https://github.com/venkatesh-saligrama/PAC

2.0CVJul 26, 2024
Deep Companion Learning: Enhancing Generalization Through Historical Consistency

Ruizhao Zhu, Venkatesh Saligrama

We propose Deep Companion Learning (DCL), a novel training method for Deep Neural Networks (DNNs) that enhances generalization by penalizing inconsistent model predictions compared to its historical performance. To achieve this, we train a deep-companion model (DCM), by using previous versions of the model to provide forecasts on new inputs. This companion model deciphers a meaningful latent semantic structure within the data, thereby providing targeted supervision that encourages the primary model to address the scenarios it finds most challenging. We validate our approach through both theoretical analysis and extensive experimentation, including ablation studies, on a variety of benchmark datasets (CIFAR-100, Tiny-ImageNet, ImageNet-1K) using diverse architectural models (ShuffleNetV2, ResNet, Vision Transformer, etc.), demonstrating state-of-the-art performance.

10.2CVFeb 24, 2025
SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models

Kevin Miller, Samarth Mishra, Aditya Gangrade et al.

Zero-shot multi-label recognition (MLR) with Vision-Language Models (VLMs) faces significant challenges without training data, model tuning, or architectural modifications. Existing approaches require prompt tuning or architectural adaptations, limiting zero-shot applicability. Our work proposes a novel solution treating VLMs as black boxes, leveraging scores without training data or ground truth. Using large language model insights on object co-occurrence, we introduce compound prompts grounded in realistic object combinations. Analysis of these prompt scores reveals VLM biases and ``AND''/``OR'' signal ambiguities, notably that maximum compound scores are surprisingly suboptimal compared to second-highest scores. We address these through a debiasing and score-fusion algorithm that corrects image bias and clarifies VLM response behaviors. Our method enhances other zero-shot approaches, consistently improving their results. Experiments show superior mean Average Precision (mAP) compared to methods requiring training data, achieved through refined object ranking for robust zero-shot MLR.

9.4LGSep 24, 2025
Linear Transformers Implicitly Discover Unified Numerical Algorithms

Patrick Lutz, Aditya Gangrade, Hadi Daneshmand et al.

We train a linear attention transformer on millions of masked-block matrix completion tasks: each prompt is masked low-rank matrix whose missing block may be (i) a scalar prediction target or (ii) an unseen kernel slice of Nyström extrapolation. The model sees only input-output pairs and a mean-squared loss; it is given no normal equations, no handcrafted iterations, and no hint that the tasks are related. Surprisingly, after training, algebraic unrolling reveals the same parameter-free update rule across three distinct computational regimes (full visibility, rank-limited updates, and distributed computation). We prove that this rule achieves second-order convergence on full-batch problems, cuts distributed iteration complexity, and remains accurate with rank-limited attention. Thus, a transformer trained solely to patch missing blocks implicitly discovers a unified, resource-adaptive iterative solver spanning prediction, estimation, and Nyström extrapolation, highlighting a powerful capability of in-context learning.

11.8CVApr 13, 2025
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning

Shengao Wang, Arjun Chandra, Aoming Liu et al.

Human infants rapidly develop visual reasoning skills from minimal input, suggesting that developmentally inspired pretraining could significantly enhance the efficiency of vision-language models (VLMs). Although recent efforts have leveraged infant-inspired datasets like SAYCam, existing evaluation benchmarks remain misaligned--they are either too simplistic, narrowly scoped, or tailored for large-scale pretrained models. Additionally, training exclusively on infant data overlooks the broader, diverse input from which infants naturally learn. To address these limitations, we propose BabyVLM, a novel framework comprising comprehensive in-domain evaluation benchmarks and a synthetic training dataset created via child-directed transformations of existing datasets. We demonstrate that VLMs trained with our synthetic dataset achieve superior performance on BabyVLM tasks compared to models trained solely on SAYCam or general-purpose data of the SAYCam size. BabyVLM thus provides a robust, developmentally aligned evaluation tool and illustrates how compact models trained on carefully curated data can generalize effectively, opening pathways toward data-efficient vision-language learning paradigms.

9.4LGMar 3, 2025
Constrained Linear Thompson Sampling

Aditya Gangrade, Venkatesh Saligrama

We study safe linear bandits (SLBs), where an agent selects actions from a convex set to maximize an unknown linear objective subject to unknown linear constraints in each round. Existing methods for SLBs provide strong regret guarantees, but require solving expensive optimization problems (e.g., second-order cones, NP hard programs). To address this, we propose Constrained Linear Thompson Sampling (COLTS), a sampling-based framework that selects actions by solving perturbed linear programs, which significantly reduces computational costs while matching the regret and risk of prior methods. We develop two main variants: S-COLTS, which ensures zero risk and $\widetilde{O}(\sqrt{d^3 T})$ regret given a safe action, and R-COLTS, which achieves $\widetilde{O}(\sqrt{d^3 T})$ regret and risk with no instance information. In simulations, these methods match or outperform state of the art SLB approaches while substantially improving scalability. On the technical front, we introduce a novel coupled noise design that ensures frequent `local optimism' about the true optimum, and a scaling-based analysis to handle the per-round variability of constraints.

7.9LGJun 21, 2024
Testing the Feasibility of Linear Programs with Bandit Feedback

Aditya Gangrade, Aditya Gopalan, Venkatesh Saligrama et al.

While the recent literature has seen a surge in the study of constrained bandit problems, all existing methods for these begin by assuming the feasibility of the underlying problem. We initiate the study of testing such feasibility assumptions, and in particular address the problem in the linear bandit setting, thus characterising the costs of feasibility testing for an unknown linear program using bandit feedback. Concretely, we test if $\exists x: Ax \ge 0$ for an unknown $A \in \mathbb{R}^{m \times d}$, by playing a sequence of actions $x_t\in \mathbb{R}^d$, and observing $Ax_t + \mathrm{noise}$ in response. By identifying the hypothesis as determining the sign of the value of a minimax game, we construct a novel test based on low-regret algorithms and a nonasymptotic law of iterated logarithms. We prove that this test is reliable, and adapts to the `signal level,' $Γ,$ of any instance, with mean sample costs scaling as $\widetilde{O}(d^2/Γ^2)$. We complement this by a minimax lower bound of $Ω(d/Γ^2)$ for sample costs of reliable tests, dominating prior asymptotic lower bounds by capturing the dependence on $d$, and thus elucidating a basic insight missing in the extant literature on such problems.

14.4CVNov 30, 2021
Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data

Samarth Mishra, Rameswar Panda, Cheng Perng Phoo et al.

Pre-training models on Imagenet or other massive datasets of real images has led to major advances in computer vision, albeit accompanied with shortcomings related to curation cost, privacy, usage rights, and ethical issues. In this paper, for the first time, we study the transferability of pre-trained models based on synthetic data generated by graphics simulators to downstream tasks from very different domains. In using such synthetic data for pre-training, we find that downstream performance on different tasks are favored by different configurations of simulation parameters (e.g. lighting, object pose, backgrounds, etc.), and that there is no one-size-fits-all solution. It is thus better to tailor synthetic pre-training data to a specific downstream task, for best performance. We introduce Task2Sim, a unified model mapping downstream task representations to optimal simulation parameters to generate synthetic pre-training data for them. Task2Sim learns this mapping by training to find the set of best parameters on a set of "seen" tasks. Once trained, it can then be used to predict best simulation parameters for novel "unseen" tasks in one shot, without requiring additional training. Given a budget in number of images per class, our extensive experiments with 20 diverse downstream tasks show Task2Sim's task-adaptive pre-training data results in significantly better downstream performance than non-adaptively choosing simulation parameters on both seen and unseen tasks. It is even competitive with pre-training on real images from Imagenet.

47.0LGNov 8, 2021Code
Federated Learning Based on Dynamic Regularization

Durmus Alp Emre Acar, Yue Zhao, Ramon Matas Navarro et al.

We propose a novel federated learning method for distributively training neural network models, where the server orchestrates cooperation between a subset of randomly chosen devices in each round. We view Federated Learning problem primarily from a communication perspective and allow more device level computations to save transmission costs. We point out a fundamental dilemma, in that the minima of the local-device level empirical loss are inconsistent with those of the global empirical loss. Different from recent prior works, that either attempt inexact minimization or utilize devices for parallelizing gradient computation, we propose a dynamic regularizer for each device at each round, so that in the limit the global and device solutions are aligned. We demonstrate both through empirical results on real and synthetic data as well as analytical results that our scheme leads to efficient training, in both convex and non-convex settings, while being fully agnostic to device heterogeneity and robust to large number of devices, partial participation and unbalanced data.

9.4MLNov 2, 2021
Faster Algorithms for Learning Convex Functions

Ali Siahkamari, Durmus Alp Emre Acar, Christopher Liao et al.

The task of approximating an arbitrary convex function arises in several learning problems such as convex regression, learning with a difference of convex (DC) functions, and learning Bregman or $f$-divergences. In this paper, we develop and analyze an approach for solving a broad range of convex function learning problems that is faster than state-of-the-art approaches. Our approach is based on a 2-block ADMM method where each block can be computed in closed form. For the task of convex Lipschitz regression, we establish that our proposed algorithm converges with iteration complexity of $ O(n\sqrt{d}/ε)$ for a dataset $\bm X \in \mathbb R^{n\times d}$ and $ε> 0$. Combined with per-iteration computation complexity, our method converges with the rate $O(n^3 d^{1.5}/ε+n^2 d^{2.5}/ε+n d^3/ε)$. This new rate improves the state of the art rate of $O(n^5d^2/ε)$ if $d = o( n^4)$. Further we provide similar solvers for DC regression and Bregman divergence learning. Unlike previous approaches, our method is amenable to the use of GPUs. We demonstrate on regression and metric learning experiments that our approach is over 100 times faster than existing approaches on some data sets, and produces results that are comparable to state of the art.

6.5LGOct 27, 2021Code
Online Selective Classification with Limited Feedback

Aditya Gangrade, Anil Kag, Ashok Cutkosky et al.

Motivated by applications to resource-limited and safety-critical domains, we study selective classification in the online learning model, wherein a predictor may abstain from classifying an instance. For example, this may model an adaptive decision to invoke more resources on this instance. Two salient aspects of the setting we consider are that the data may be non-realisable, due to which abstention may be a valid long-term action, and that feedback is only received when the learner abstains, which models the fact that reliable labels are only available when the resource intensive processing is invoked. Within this framework, we explore strategies that make few mistakes, while not abstaining too many times more than the best-in-hindsight error-free classifier from a given class. That is, the one that makes no mistakes, while abstaining the fewest number of times. We construct simple versioning-based schemes for any $μ\in (0,1],$ that make most $T^μ$ mistakes while incurring \smash{$\tilde{O}(T^{1-μ})$} excess abstention against adaptive adversaries. We further show that this dependence on $T$ is tight, and provide illustrative experiments on realistic datasets.

9.9LGJul 22, 2021
Bandit Quickest Changepoint Detection

Aditya Gopalan, Venkatesh Saligrama, Braghadeesh Lakshminarayanan

Many industrial and security applications employ a suite of sensors for detecting abrupt changes in temporal behavior patterns. These abrupt changes typically manifest locally, rendering only a small subset of sensors informative. Continuous monitoring of every sensor can be expensive due to resource constraints, and serves as a motivation for the bandit quickest changepoint detection problem, where sensing actions (or sensors) are sequentially chosen, and only measurements corresponding to chosen actions are observed. We derive an information-theoretic lower bound on the detection delay for a general class of finitely parameterized probability distributions. We then propose a computationally efficient online sensing scheme, which seamlessly balances the need for exploration of different sensing options with exploitation of querying informative actions. We derive expected delay bounds for the proposed scheme and show that these bounds match our information-theoretic lower bounds at low false alarm rates, establishing optimality of the proposed method. We then perform a number of experiments on synthetic and real datasets demonstrating the effectiveness of our proposed method.

3.3LGOct 23, 2020
Online Algorithm for Unsupervised Sequential Selection with Contextual Information

Arun Verma, Manjesh K. Hanawal, Csaba Szepesvári et al.

In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback. In our setup, arms are associated with fixed costs and are ordered, forming a cascade. In each round, a context is presented, and the learner selects the arms sequentially till some depth. The total cost incurred by stopping at an arm is the sum of fixed costs of arms selected and the stochastic loss associated with the arm. The learner's goal is to learn a decision rule that maps contexts to arms with the goal of minimizing the total expected loss. The problem is challenging as we are faced with an unsupervised setting as the total loss cannot be estimated. Clearly, learning is feasible only if the optimal arm can be inferred (explicitly or implicitly) from the problem structure. We observe that learning is still possible when the problem instance satisfies the so-called 'Contextual Weak Dominance' (CWD) property. Under CWD, we propose an algorithm for the contextual USS problem and demonstrate that it has sub-linear regret. Experiments on synthetic and real datasets validate our algorithm.

14.7LGOct 15, 2020Code
Selective Classification via One-Sided Prediction

Aditya Gangrade, Anil Kag, Venkatesh Saligrama

We propose a novel method for selective classification (SC), a problem which allows a classifier to abstain from predicting some instances, thus trading off accuracy against coverage (the fraction of instances predicted). In contrast to prior gating or confidence-set based work, our proposed method optimises a collection of class-wise decoupled one-sided empirical risks, and is in essence a method for explicitly finding the largest decision sets for each class that have few false positives. This one-sided prediction (OSP) based relaxation yields an SC scheme that attains near-optimal coverage in the practically relevant high target accuracy regime, and further admits efficient implementation, leading to a flexible and principled method for SC. We theoretically derive generalization bounds for SC and OSP, and empirically we show that our scheme strongly outperforms state of the art methods in coverage at small error levels.

4.2LGOct 12, 2020Code
RNN Training along Locally Optimal Trajectories via Frank-Wolfe Algorithm

Yun Yue, Ming Li, Venkatesh Saligrama et al.

We propose a novel and efficient training method for RNNs by iteratively seeking a local minima on the loss surface within a small region, and leverage this directional vector for the update, in an outer-loop. We propose to utilize the Frank-Wolfe (FW) algorithm in this context. Although, FW implicitly involves normalized gradients, which can lead to a slow convergence rate, we develop a novel RNN training method that, surprisingly, even with the additional cost, the overall training cost is empirically observed to be lower than back-propagation. Our method leads to a new Frank-Wolfe method, that is in essence an SGD algorithm with a restart scheme. We prove that under certain conditions our algorithm has a sublinear convergence rate of $O(1/ε)$ for $ε$ error. We then conduct empirical experiments on several benchmark datasets including those that exhibit long-term dependencies, and show significant performance improvement. We also experiment with deep RNN architectures and show efficient training performance. Finally, we demonstrate that our training method is robust to noisy data.

12.0MLJul 5, 2020Code
Piecewise Linear Regression via a Difference of Convex Functions

Ali Siahkamari, Aditya Gangrade, Brian Kulis et al.

We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data. These are functions $f$ that may be represented as the difference $φ_1 - φ_2$ for a choice of convex functions $φ_1, φ_2$. The method proceeds by estimating piecewise-liner convex functions, in a manner similar to max-affine regression, whose difference approximates the data. The choice of the function is regularised by a new seminorm over the class of DC functions that controls the $\ell_\infty$ Lipschitz constant of the estimate. The resulting methodology can be efficiently implemented via Quadratic programming even in high dimensions, and is shown to have close to minimax statistical risk. We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.

5.8LGApr 14, 2020
Budget Learning via Bracketing

Aditya Gangrade, Durmus Alp Emre Acar, Venkatesh Saligrama

Conventional machine learning applications in the mobile/IoT setting transmit data to a cloud-server for predictions. Due to cost considerations (power, latency, monetary), it is desirable to minimise device-to-server transmissions. The budget learning (BL) problem poses the learner's goal as minimising use of the cloud while suffering no discernible loss in accuracy, under the constraint that the methods employed be edge-implementable. We propose a new formulation for the BL problem via the concept of bracketings. Concretely, we propose to sandwich the cloud's prediction, $g,$ via functions $h^-, h^+$ from a `simple' class so that $h^- \le g \le h^+$ nearly always. On an instance $x$, if $h^+(x)=h^-(x)$, we leverage local processing, and bypass the cloud. We explore theoretical aspects of this formulation, providing PAC-style learnability definitions; associating the notion of budget learnability to approximability via brackets; and giving VC-theoretic analyses of their properties. We empirically validate our theory on real-world datasets, demonstrating improved performance over prior gating based methods.

17.1CVNov 18, 2019
Dont Even Look Once: Synthesizing Features for Zero-Shot Detection

Pengkai Zhu, Hanxiao Wang, Venkatesh Saligrama

Zero-shot detection, namely, localizing both seen and unseen objects, increasingly gains importance for large-scale applications, with large number of object classes, since, collecting sufficient annotated data with ground truth bounding boxes is simply not scalable. While vanilla deep neural networks deliver high performance for objects available during training, unseen object detection degrades significantly. At a fundamental level, while vanilla detectors are capable of proposing bounding boxes, which include unseen objects, they are often incapable of assigning high-confidence to unseen objects, due to the inherent precision/recall tradeoffs that requires rejecting background objects. We propose a novel detection algorithm Dont Even Look Once (DELO), that synthesizes visual features for unseen objects and augments existing training algorithms to incorporate unseen object detection. Our proposed scheme is evaluated on Pascal VOC and MSCOCO, and we demonstrate significant improvements in test accuracy over vanilla and other state-of-art zero-shot detectors

8.6LGAug 22, 2019
RNNs Evolving on an Equilibrium Manifold: A Panacea for Vanishing and Exploding Gradients?

Anil Kag, Ziming Zhang, Venkatesh Saligrama

Recurrent neural networks (RNNs) are particularly well-suited for modeling long-term dependencies in sequential data, but are notoriously hard to train because the error backpropagated in time either vanishes or explodes at an exponential rate. While a number of works attempt to mitigate this effect through gated recurrent units, well-chosen parametric constraints, and skip-connections, we develop a novel perspective that seeks to evolve the hidden state on the equilibrium manifold of an ordinary differential equation (ODE). We propose a family of novel RNNs, namely {\em Equilibriated Recurrent Neural Networks} (ERNNs) that overcome the gradient decay or explosion effect and lead to recurrent models that evolve on the equilibrium manifold. We show that equilibrium points are stable, leading to fast convergence of the discretized ODE to fixed points. Furthermore, ERNNs account for long-term dependencies, and can efficiently recall informative aspects of data from the distant past. We show that ERNNs achieve state-of-the-art accuracy on many challenging data sets with 3-10x speedups, 1.5-3x model size reduction, and with similar prediction cost relative to vanilla RNNs.

8.8MLMay 28, 2019Code
Learning to Approximate a Bregman Divergence

Ali Siahkamari, Xide Xia, Venkatesh Saligrama et al.

Bregman divergences generalize measures such as the squared Euclidean distance and the KL divergence, and arise throughout many areas of machine learning. In this paper, we focus on the problem of approximating an arbitrary Bregman divergence from supervision, and we provide a well-principled approach to analyzing such approximations. We develop a formulation and algorithm for learning arbitrary Bregman divergences based on approximating their underlying convex generating function via a piecewise linear function. We provide theoretical approximation bounds using our parameterization and show that the generalization error $O_p(m^{-1/2})$ for metric learning using our framework matches the known generalization error in the strictly less general Mahalanobis metric learning setting. We further demonstrate empirically that our method performs well in comparison to existing metric learning methods, particularly for clustering and ranking problems.

21.4HCApr 25, 2019
Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers

Yao Ma, Alex Olshevsky, Venkatesh Saligrama et al.

We consider worker skill estimation for the single-coin Dawid-Skene crowdsourcing model. In practice, skill-estimation is challenging because worker assignments are sparse and irregular due to the arbitrary and uncontrolled availability of workers. We formulate skill estimation as a rank-one correlation-matrix completion problem, where the observed components correspond to observed label correlations between workers. We show that the correlation matrix can be successfully recovered and skills are identifiable if and only if the sampling matrix (observed components) does not have a bipartite connected component. We then propose a projected gradient descent scheme and show that skill estimates converge to the desired global optima for such sampling matrices. Our proof is original and the results are surprising in light of the fact that even the weighted rank-one matrix factorization problem is NP-hard in general. Next, we derive sample complexity bounds in terms of spectral properties of the signless Laplacian of the sampling matrix. Our proposed scheme achieves state-of-art performance on a number of real-world datasets.

5.4LGMar 2, 2019
Equilibrated Recurrent Neural Network: Neuronal Time-Delayed Self-Feedback Improves Accuracy and Stability

Ziming Zhang, Anil Kag, Alan Sullivan et al.

We propose a novel {\it Equilibrated Recurrent Neural Network} (ERNN) to combat the issues of inaccuracy and instability in conventional RNNs. Drawing upon the concept of autapse in neuroscience, we propose augmenting an RNN with a time-delayed self-feedback loop. Our sole purpose is to modify the dynamics of each internal RNN state and, at any time, enforce it to evolve close to the equilibrium point associated with the input signal at that time. We show that such self-feedback helps stabilize the hidden state transitions leading to fast convergence during training while efficiently learning discriminative latent features that result in state-of-the-art results on several benchmark datasets at test-time. We propose a novel inexact Newton method to solve fixed-point conditions given model parameters for generating the latent features at each hidden state. We prove that our inexact Newton method converges locally with linear rate (under mild conditions). We leverage this result for efficient training of ERNNs based on backpropagation.

4.1LGFeb 1, 2019
Graph Resistance and Learning from Pairwise Comparisons

Julien M. Hendrickx, Alex Olshevsky, Venkatesh Saligrama

We consider the problem of learning the qualities of a collection of items by performing noisy comparisons among them. Following the standard paradigm, we assume there is a fixed "comparison graph" and every neighboring pair of items in this graph is compared $k$ times according to the Bradley-Terry-Luce model (where the probability than an item wins a comparison is proportional the item quality). We are interested in how the relative error in quality estimation scales with the comparison graph in the regime where $k$ is large. We prove that, after a known transition period, the relevant graph-theoretic quantity is the square root of the resistance of the comparison graph. Specifically, we provide an algorithm that is minimax optimal. The algorithm has a relative error decay that scales with the square root of the graph resistance, and provide a matching lower bound (up to log factors). The performance guarantee of our algorithm, both in terms of the graph and the skewness of the item quality distribution, outperforms earlier results.

4.7CVJan 25, 2019
Learning Classifiers for Domain Adaptation, Zero and Few-Shot Recognition Based on Learning Latent Semantic Parts

Pengkai Zhu, Hanxiao Wang, Venkatesh Saligrama

In computer vision applications, such as domain adaptation (DA), few shot learning (FSL) and zero-shot learning (ZSL), we encounter new objects and environments, for which insufficient examples exist to allow for training "models from scratch," and methods that adapt existing models, trained on the presented training environment, to the new scenario are required. We propose a novel visual attribute encoding method that encodes each image as a low-dimensional probability vector composed of prototypical part-type probabilities. The prototypes are learnt to be representative of all training data. At test-time we utilize this encoding as an input to a classifier. At test-time we freeze the encoder and only learn/adapt the classifier component to limited annotated labels in FSL; new semantic attributes in ZSL. We conduct extensive experiments on benchmark datasets. Our method outperforms state-of-art methods trained for the specific contexts (ZSL, FSL, DA).

5.4LGJan 15, 2019
Online Algorithm for Unsupervised Sensor Selection

Arun Verma, Manjesh K. Hanawal, Csaba Szepesvári et al.

In many security and healthcare systems, the detection and diagnosis systems use a sequence of sensors/tests. Each test outputs a prediction of the latent state and carries an inherent cost. However, the correctness of the predictions cannot be evaluated since the ground truth annotations may not be available. Our objective is to learn strategies for selecting a test that gives the best trade-off between accuracy and costs in such Unsupervised Sensor Selection (USS) problems. Clearly, learning is feasible only if ground truth can be inferred (explicitly or implicitly) from the problem structure. It is observed that this happens if the problem satisfies the 'Weak Dominance' (WD) property. We set up the USS problem as a stochastic partial monitoring problem and develop an algorithm with sub-linear regret under the WD property. We argue that our algorithm is optimal and evaluate its performance on problem instances generated from synthetic and real-world datasets.

3.3ITNov 29, 2018
Testing Changes in Communities for the Stochastic Block Model

Aditya Gangrade, Praveen Venkatesh, Bobak Nazer et al.

We propose and analyze the problems of \textit{community goodness-of-fit and two-sample testing} for stochastic block models (SBM), where changes arise due to modification in community memberships of nodes. Motivated by practical applications, we consider the challenging sparse regime, where expected node degrees are constant, and the inter-community mean degree ($b$) scales proportionally to intra-community mean degree ($a$). Prior work has sharply characterized partial or full community recovery in terms of a "signal-to-noise ratio" ($\mathrm{SNR}$) based on $a$ and $b$. For both problems, we propose computationally-efficient tests that can succeed far beyond the regime where recovery of community membership is even possible. Overall, for large changes, $s \gg \sqrt{n}$, we need only $\mathrm{SNR}= O(1)$ whereas a naïve test based on community recovery with $O(s)$ errors requires $\mathrm{SNR}= Θ(\log n)$. Conversely, in the small change regime, $s \ll \sqrt{n}$, via an information-theoretic lower bound, we show that, surprisingly, no algorithm can do better than the naïve algorithm that first estimates the community up to $O(s)$ errors and then detects changes. We validate these phenomena numerically on SBMs and on real-world datasets as well as Markov Random Fields where we only observe node data rather than the existence of links.

17.7CVNov 19, 2018
Generalized Zero-Shot Recognition based on Visually Semantic Embedding

Pengkai Zhu, Hanxiao Wang, Venkatesh Saligrama

We propose a novel Generalized Zero-Shot learning (GZSL) method that is agnostic to both unseen images and unseen semantic vectors during training. Prior works in this context propose to map high-dimensional visual features to the semantic domain, we believe contributes to the semantic gap. To bridge the gap, we propose a novel low-dimensional embedding of visual instances that is "visually semantic." Analogous to semantic data that quantifies the existence of an attribute in the presented instance, components of our visual embedding quantifies existence of a prototypical part-type in the presented instance. In parallel, as a thought experiment, we quantify the impact of noisy semantic data by utilizing a novel visual oracle to visually supervise a learner. These factors, namely semantic noise, visual-semantic gap and label noise lead us to propose a new graphical model for inference with pairwise interactions between label, semantic data, and inputs. We tabulate results on a number of benchmark datasets demonstrating significant improvement in accuracy over state-of-the-art under both semantic and visual supervision.

4.6CVNov 16, 2018
Cost-Aware Fine-Grained Recognition for IoTs Based on Sequential Fixations

Hanxiao Wang, Venkatesh Saligrama, Stan Sclaroff et al.

We consider the problem of fine-grained classification on an edge camera device that has limited power. The edge device must sparingly interact with the cloud to minimize communication bits to conserve power, and the cloud upon receiving the edge inputs returns a classification label. To deal with fine-grained classification, we adopt the perspective of sequential fixation with a foveated field-of-view to model cloud-edge interactions. We propose a novel deep reinforcement learning-based foveation model, DRIFT, that sequentially generates and recognizes mixed-acuity images.Training of DRIFT requires only image-level category labels and encourages fixations to contain task-relevant information, while maintaining data efficiency. Specifically, wetrain a foveation actor network with a novel Deep Deterministic Policy Gradient by Conditioned Critic and Coaching (DDPGC3) algorithm. In addition, we propose to shape the reward to provide informative feedback after each fixation to better guide RL training. We demonstrate the effectiveness of DRIFT on this task by evaluating on five fine-grained classification benchmark datasets, and show that the proposed approach achieves state-of-the-art performance with over 3X reduction in transmitted pixels.

48.1LGAug 24, 2018Code
Robust Text Classifier on Test-Time Budgets

Md Rizwan Parvez, Tolga Bolukbasi, Kai-Wei Chang et al.

We propose a generic and interpretable learning framework for building robust text classification model that achieves accuracy comparable to full models under test-time budget constraints. Our approach learns a selector to identify words that are relevant to the prediction tasks and passes them to the classifier for processing. The selector is trained jointly with the classifier and directly learns to incorporate with the classifier. We further propose a data aggregation scheme to improve the robustness of the classifier. Our learning framework is general and can be incorporated with any type of text classification model. On real-world data, we show that the proposed approach improves the performance of a given classifier and speeds up the model with a mere loss in accuracy performance.

13.8CVMar 19, 2018
Zero-Shot Detection

Pengkai Zhu, Hanxiao Wang, Venkatesh Saligrama

As we move towards large-scale object detection, it is unrealistic to expect annotated training data, in the form of bounding box annotations around objects, for all object classes at sufficient scale, and so methods capable of unseen object detection are required. We propose a novel zero-shot method based on training an end-to-end model that fuses semantic attribute prediction with visual features to propose object bounding boxes for seen and unseen classes. While we utilize semantic features during training, our method is agnostic to semantic information for unseen classes at test-time. Our method retains the efficiency and effectiveness of YOLOv2 for objects seen during training, while improving its performance for novel and unseen objects. The ability of state-of-art detection methods to learn discriminative object features to reject background proposals also limits their performance for unseen objects. We posit that, to detect unseen objects, we must incorporate semantic information into the visual domain so that the learned visual features reflect this information and leads to improved recall rates for unseen objects. We test our method on PASCAL VOC and MS COCO dataset and observed significant improvements on the average precision of unseen classes.

5.9MMDec 17, 2017
Probabilistic Semantic Retrieval for Surveillance Videos with Activity Graphs

Yuting Chen, Joseph Wang, Yannan Bai et al.

We present a novel framework for finding complex activities matching user-described queries in cluttered surveillance videos. The wide diversity of queries coupled with unavailability of annotated activity data limits our ability to train activity models. To bridge the semantic gap we propose to let users describe an activity as a semantic graph with object attributes and inter-object relationships associated with nodes and edges, respectively. We learn node/edge-level visual predictors during training and, at test-time, propose to retrieve activity by identifying likely locations that match the semantic graph. We formulate a novel CRF based probabilistic activity localization objective that accounts for mis-detections, mis-classifications and track-losses, and outputs a likelihood score for a candidate grounded location of the query in the video. We seek groundings that maximize overall precision and recall. To handle the combinatorial search over all high-probability groundings, we propose a highest precision subgraph matching algorithm. Our method outperforms existing retrieval methods on benchmarked datasets.

11.7ITOct 28, 2017
Lower Bounds for Two-Sample Structural Change Detection in Ising and Gaussian Models

Aditya Gangrade, Bobak Nazer, Venkatesh Saligrama

The change detection problem is to determine if the Markov network structures of two Markov random fields differ from one another given two sets of samples drawn from the respective underlying distributions. We study the trade-off between the sample sizes and the reliability of change detection, measured as a minimax risk, for the important cases of the Ising models and the Gaussian Markov random fields restricted to the models which have network structures with $p$ nodes and degree at most $d$, and obtain information-theoretic lower bounds for reliable change detection over these models. We show that for the Ising model, $Ω\left(\frac{d^2}{(\log d)^2}\log p\right)$ samples are required from each dataset to detect even the sparsest possible changes, and that for the Gaussian, $Ω\left( γ^{-2} \log(p)\right)$ samples are required from each dataset to detect change, where $γ$ is the smallest ratio of off-diagonal to diagonal terms in the precision matrices of the distributions. These bounds are compared to the corresponding results in structure learning, and closely match them under mild conditions on the model parameters. Thus, our change detection bounds inherit partial tightness from the structure learning schemes in previous literature, demonstrating that in certain parameter regimes, the naive structure learning based approach to change detection is minimax optimal up to constant factors.

2.0LGJun 20, 2017
Crowdsourcing with Sparsely Interacting Workers

Yao Ma, Alex Olshevsky, Venkatesh Saligrama et al.

We consider estimation of worker skills from worker-task interaction data (with unknown labels) for the single-coin crowd-sourcing binary classification model in symmetric noise. We define the (worker) interaction graph whose nodes are workers and an edge between two nodes indicates whether or not the two workers participated in a common task. We show that skills are asymptotically identifiable if and only if an appropriate limiting version of the interaction graph is irreducible and has odd-cycles. We then formulate a weighted rank-one optimization problem to estimate skills based on observations on an irreducible, aperiodic interaction graph. We propose a gradient descent scheme and show that for such interaction graphs estimates converge asymptotically to the global minimum. We characterize noise robustness of the gradient scheme in terms of spectral properties of signless Laplacians of the interaction graph. We then demonstrate that a plug-in estimator based on the estimated skills achieves state-of-art performance on a number of real-world datasets. Our results have implications for rank-one matrix completion problem in that gradient descent can provably recover $W \times W$ rank-one matrices based on $W+1$ off-diagonal observations of a connected graph with a single odd-cycle.

1.8MLMay 31, 2017
Sequential Dynamic Decision Making with Deep Neural Nets on a Test-Time Budget

Henghui Zhu, Feng Nan, Ioannis Paschalidis et al.

Deep neural network (DNN) based approaches hold significant potential for reinforcement learning (RL) and have already shown remarkable gains over state-of-art methods in a number of applications. The effectiveness of DNN methods can be attributed to leveraging the abundance of supervised data to learn value functions, Q-functions, and policy function approximations without the need for feature engineering. Nevertheless, the deployment of DNN-based predictors with very deep architectures can pose an issue due to computational and other resource constraints at test-time in a number of applications. We propose a novel approach for reducing the average latency by learning a computationally efficient gating function that is capable of recognizing states in a sequential decision process for which policy prescriptions of a shallow network suffices and deeper layers of the DNN have little marginal utility. The overall system is adaptive in that it dynamically switches control actions based on state-estimates in order to reduce average latency without sacrificing terminal performance. We experiment with a number of alternative loss-functions to train gating functions and shallow policies and show that in a number of applications a speed-up of up to almost 5X can be obtained with little loss in performance.

13.1MLMay 26, 2017
Adaptive Classification for Prediction Under a Budget

Feng Nan, Venkatesh Saligrama

We propose a novel adaptive approximation approach for test-time resource-constrained prediction. Given an input instance at test-time, a gating function identifies a prediction model for the input among a collection of models. Our objective is to minimize overall average cost without sacrificing accuracy. We learn gating and prediction models on fully labeled training data by means of a bottom-up strategy. Our novel bottom-up method first trains a high-accuracy complex model. Then a low-complexity gating and prediction model are subsequently learned to adaptively approximate the high-accuracy model in regions where low-cost models are capable of making highly accurate predictions. We pose an empirical loss minimization problem with cost constraints to jointly train gating and prediction models. On a number of benchmark datasets our method outperforms state-of-the-art achieving higher accuracy for the same cost.

1.0MLMay 10, 2017
Comments on the proof of adaptive submodular function minimization

Feng Nan, Venkatesh Saligrama

We point out an issue with Theorem 5 appearing in "Group-based active query selection for rapid diagnosis in time-critical situations". Theorem 5 bounds the expected number of queries for a greedy algorithm to identify the class of an item within a constant factor of optimal. The Theorem is based on correctness of a result on minimization of adaptive submodular functions. We present an example that shows that a critical step in Theorem A.11 of "Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization" is incorrect.