Urun Dogan

CV
h-index21
15papers
628citations
Novelty50%
AI Score40

15 Papers

LGJun 1, 2023
IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control

Rohan Chitnis, Yingchen Xu, Bobak Hashemi et al.

Model-based reinforcement learning (RL) has shown great promise due to its sample efficiency, but still struggles with long-horizon sparse-reward tasks, especially in offline settings where the agent learns from a fixed dataset. We hypothesize that model-based RL agents struggle in these environments due to a lack of long-term planning capabilities, and that planning in a temporally abstract model of the environment can alleviate this issue. In this paper, we make two key contributions: 1) we introduce an offline model-based RL algorithm, IQL-TD-MPC, that extends the state-of-the-art Temporal Difference Learning for Model Predictive Control (TD-MPC) with Implicit Q-Learning (IQL); 2) we propose to use IQL-TD-MPC as a Manager in a hierarchical setting with any off-the-shelf offline RL algorithm as a Worker. More specifically, we pre-train a temporally abstract IQL-TD-MPC Manager to predict "intent embeddings", which roughly correspond to subgoals, via planning. We empirically show that augmenting state representations with intent embeddings generated by an IQL-TD-MPC manager significantly improves off-the-shelf offline RL agents' performance on some of the most challenging D4RL benchmark tasks. For instance, the offline RL algorithms AWAC, TD3-BC, DT, and CQL all get zero or near-zero normalized evaluation scores on the medium and large antmaze tasks, while our modification gives an average score over 40.

LGDec 6, 2023Code
Pearl: A Production-ready Reinforcement Learning Agent

Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari et al.

Reinforcement learning (RL) is a versatile framework for optimizing long-term goals. Although many real-world problems can be formalized with RL, learning and deploying a performant RL policy requires a system designed to address several important challenges, including the exploration-exploitation dilemma, partial observability, dynamic action spaces, and safety concerns. While the importance of these challenges has been well recognized, existing open-source RL libraries do not explicitly address them. This paper introduces Pearl, a Production-Ready RL software package designed to embrace these challenges in a modular way. In addition to presenting benchmarking results, we also highlight examples of Pearl's ongoing industry adoption to demonstrate its advantages for production use cases. Pearl is open sourced on GitHub at github.com/facebookresearch/pearl and its official website is pearlagent.github.io.

LGOct 7, 2021Code
Offline RL With Resource Constrained Online Deployment

Jayanth Reddy Regatti, Aniket Anand Deshmukh, Frank Cheng et al.

Offline reinforcement learning is used to train policies in scenarios where real-time access to the environment is expensive or impossible. As a natural consequence of these harsh conditions, an agent may lack the resources to fully observe the online environment before taking an action. We dub this situation the resource-constrained setting. This leads to situations where the offline dataset (available for training) can contain fully processed features (using powerful language models, image models, complex sensors, etc.) which are not available when actions are actually taken online. This disconnect leads to an interesting and unexplored problem in offline RL: Is it possible to use a richly processed offline dataset to train a policy which has access to fewer features in the online environment? In this work, we introduce and formalize this novel resource-constrained problem setting. We highlight the performance gap between policies trained using the full offline dataset and policies trained using limited features. We address this performance gap with a policy transfer algorithm which first trains a teacher agent using the offline dataset where features are fully available, and then transfers this knowledge to a student agent that only uses the resource-constrained features. To better capture the challenge of this setting, we propose a data collection procedure: Resource Constrained-Datasets for RL (RC-D4RL). We evaluate our transfer algorithm on RC-D4RL and the popular D4RL benchmarks and observe consistent improvement over the baseline (TD3+BC without transfer). The code for the experiments is available at https://github.com/JayanthRR/RC-OfflineRL.

CVMay 4, 2021Code
Representation Learning for Clustering via Building Consensus

Aniket Anand Deshmukh, Jayanth Reddy Regatti, Eren Manavoglu et al.

In this paper, we focus on unsupervised representation learning for clustering of images. Recent advances in deep clustering and unsupervised representation learning are based on the idea that different views of an input image (generated through data augmentation techniques) must be close in the representation space (exemplar consistency), and/or similar images must have similar cluster assignments (population consistency). We define an additional notion of consistency, consensus consistency, which ensures that representations are learned to induce similar partitions for variations in the representation space, different clustering algorithms or different initializations of a single clustering algorithm. We define a clustering loss by executing variations in the representation space and seamlessly integrate all three consistencies (consensus, exemplar and population) into an end-to-end learning framework. The proposed algorithm, consensus clustering using unsupervised representation learning (ConCURL), improves upon the clustering performance of state-of-the-art methods on four out of five image datasets. Furthermore, we extend the evaluation procedure for clustering to reflect the challenges encountered in real-world clustering tasks, such as maintaining clustering performance in cases with distribution shifts. We also perform a detailed ablation study for a deeper understanding of the proposed algorithm. The code and the trained models are available at https://github.com/JayanthRR/ConCURL_NCE.

LGSep 10, 2025
Value bounds and Convergence Analysis for Averages of LRP attributions

Alexander Binder, Nastaran Takmil-Homayouni, Urun Dogan

We analyze numerical properties of Layer-wise relevance propagation (LRP)-type attribution methods by representing them as a product of modified gradient matrices. This representation creates an analogy to matrix multiplications of Jacobi-matrices which arise from the chain rule of differentiation. In order to shed light on the distribution of attribution values, we derive upper bounds for singular values. Furthermore we derive component-wise bounds for attribution map values. As a main result, we apply these component-wise bounds to obtain multiplicative constants. These constants govern the convergence of empirical means of attributions to expectations of attribution maps. This finding has important implications for scenarios where multiple non-geometric data augmentations are applied to individual test samples, as well as for Smoothgrad-type attribution methods. In particular, our analysis reveals that the constants for LRP-beta remain independent of weight norms, a significant distinction from both gradient-based methods and LRP-epsilon.

CVOct 3, 2020
Consensus Clustering With Unsupervised Representation Learning

Jayanth Reddy Regatti, Aniket Anand Deshmukh, Eren Manavoglu et al.

Recent advances in deep clustering and unsupervised representation learning are based on the idea that different views of an input image (generated through data augmentation techniques) must either be closer in the representation space, or have a similar cluster assignment. Bootstrap Your Own Latent (BYOL) is one such representation learning algorithm that has achieved state-of-the-art results in self-supervised image classification on ImageNet under the linear evaluation protocol. However, the utility of the learnt features of BYOL to perform clustering is not explored. In this work, we study the clustering ability of BYOL and observe that features learnt using BYOL may not be optimal for clustering. We propose a novel consensus clustering based loss function, and train BYOL with the proposed loss in an end-to-end way that improves the clustering ability and outperforms similar clustering based methods on some popular computer vision datasets.

CVAug 17, 2020
Zero Shot Domain Generalization

Udit Maniyar, Joseph K J, Aniket Anand Deshmukh et al.

Standard supervised learning setting assumes that training data and test data come from the same distribution (domain). Domain generalization (DG) methods try to learn a model that when trained on data from multiple domains, would generalize to a new unseen domain. We extend DG to an even more challenging setting, where the label space of the unseen domain could also change. We introduce this problem as Zero-Shot Domain Generalization (to the best of our knowledge, the first such effort), where the model generalizes across new domains and also across new classes in those domains. We propose a simple strategy which effectively exploits semantic information of classes, to adapt existing DG methods to meet the demands of Zero-Shot Domain Generalization. We evaluate the proposed methods on CIFAR-10, CIFAR-100, F-MNIST and PACS datasets, establishing a strong baseline to foster interest in this new research direction.

CVMar 18, 2020
Self-Supervised Contextual Bandits in Computer Vision

Aniket Anand Deshmukh, Abhimanu Kumar, Levi Boyles et al.

Contextual bandits are a common problem faced by machine learning practitioners in domains as diverse as hypothesis testing to product recommendations. There have been a lot of approaches in exploiting rich data representations for contextual bandit problems with varying degree of success. Self-supervised learning is a promising approach to find rich data representations without explicit labels. In a typical self-supervised learning scheme, the primary task is defined by the problem objective (e.g. clustering, classification, embedding generation etc.) and the secondary task is defined by the self-supervision objective (e.g. rotation prediction, words in neighborhood, colorization, etc.). In the usual self-supervision, we learn implicit labels from the training data for a secondary task. However, in the contextual bandit setting, we don't have the advantage of getting implicit labels due to lack of data in the initial phase of learning. We provide a novel approach to tackle this issue by combining a contextual bandit objective with a self supervision objective. By augmenting contextual bandit learning with self-supervision we get a better cumulative reward. Our results on eight popular computer vision datasets show substantial gains in cumulative reward. We provide cases where the proposed scheme doesn't perform optimally and give alternative methods for better learning in these cases.

MLFeb 18, 2020
Data Transformation Insights in Self-supervision with Clustering Tasks

Abhimanu Kumar, Aniket Anand Deshmukh, Urun Dogan et al.

Self-supervision is key to extending use of deep learning for label scarce domains. For most of self-supervised approaches data transformations play an important role. However, up until now the impact of transformations have not been studied. Furthermore, different transformations may have different impact on the system. We provide novel insights into the use of data transformation in self-supervised tasks, specially pertaining to clustering. We show theoretically and empirically that certain set of transformations are helpful in convergence of self-supervised clustering. We also show the cases when the transformations are not helpful or in some cases even harmful. We show faster convergence rate with valid transformations for convex as well as certain family of non-convex objectives along with the proof of convergence to the original set of optima. We have synthetic as well as real world data experiments. Empirically our results conform with the theoretical insights provided.

CVNov 15, 2019
Label-similarity Curriculum Learning

Urun Dogan, Aniket Anand Deshmukh, Marcin Machura et al.

Curriculum learning can improve neural network training by guiding the optimization to desirable optima. We propose a novel curriculum learning approach for image classification that adapts the loss function by changing the label representation. The idea is to use a probability distribution over classes as target label, where the class probabilities reflect the similarity to the true class. Gradually, this label representation is shifted towards the standard one-hot-encoding. That is, in the beginning minor mistakes are corrected less than large mistakes, resembling a teaching process in which broad concepts are explained first before subtle differences are taught. The class similarity can be based on prior knowledge. For the special case of the labels being natural words, we propose a generic way to automatically compute the similarities. The natural words are embedded into Euclidean space using a standard word embedding. The probability of each class is then a function of the cosine similarity between the vector representations of the class and the true label. The proposed label-similarity curriculum learning (LCL) approach was empirically evaluated using several popular deep learning architectures for image classification tasks applied to five datasets including ImageNet, CIFAR100, and AWA2. In all scenarios, LCL was able to improve the classification accuracy on the test data compared to standard training.

MLMay 24, 2019
A Generalization Error Bound for Multi-class Domain Generalization

Aniket Anand Deshmukh, Yunwen Lei, Srinagesh Sharma et al.

Domain generalization is the problem of assigning labels to an unlabeled data set, given several similar data sets for which labels have been provided. Despite considerable interest in this problem over the last decade, there has been no theoretical analysis in the setting of multi-class classification. In this work, we study a kernel-based learning algorithm and establish a generalization error bound that scales logarithmically in the number of classes, matching state-of-the-art bounds for multi-class classification in the conventional learning setting. We also demonstrate empirically that the proposed algorithm achieves significant performance gains compared to a pooling strategy.

MLNov 21, 2017
Domain Generalization by Marginal Transfer Learning

Gilles Blanchard, Aniket Anand Deshmukh, Urun Dogan et al.

In the problem of domain generalization (DG), there are labeled training data sets from several related prediction problems, and the goal is to make accurate predictions on future unlabeled data sets that are not known to the learner. This problem arises in several applications where data distributions fluctuate because of environmental, technical, or other sources of variation. We introduce a formal framework for DG, and argue that it can be viewed as a kind of supervised learning problem by augmenting the original feature space with the marginal distribution of feature vectors. While our framework has several connections to conventional analysis of supervised learning algorithms, several unique aspects of DG require new methods of analysis. This work lays the learning theoretic foundations of domain generalization, building on our earlier conference paper where the problem of DG was introduced (Blanchard et al., 2011). We present two formal models of data generation, corresponding notions of risk, and distribution-free generalization error analysis. By focusing our attention on kernel methods, we also provide more quantitative results and a universally consistent algorithm. An efficient implementation is provided for this algorithm, which is experimentally compared to a pooling strategy on one synthetic and three real-world data sets.

LGJun 29, 2017
Data-dependent Generalization Bounds for Multi-class Classification

Yunwen Lei, Urun Dogan, Ding-Xuan Zhou et al.

In this paper, we study data-dependent generalization error bounds exhibiting a mild dependency on the number of classes, making them suitable for multi-class learning with a large number of label classes. The bounds generally hold for empirical multi-class risk minimization algorithms using an arbitrary norm as regularizer. Key to our analysis are new structural results for multi-class Gaussian complexities and empirical $\ell_\infty$-norm covering numbers, which exploit the Lipschitz continuity of the loss function with respect to the $\ell_2$- and $\ell_\infty$-norm, respectively. We establish data-dependent error bounds in terms of complexities of a linear function class defined on a finite set induced by training examples, for which we show tight lower and upper bounds. We apply the results to several prominent multi-class learning machines, exhibiting a tighter dependency on the number of classes than the state of the art. For instance, for the multi-class SVM by Crammer and Singer (2002), we obtain a data-dependent bound with a logarithmic dependency which significantly improves the previous square-root dependency. Experimental results are reported to verify the effectiveness of our theoretical findings.

MLMay 24, 2017
Multi-Task Learning for Contextual Bandits

Aniket Anand Deshmukh, Urun Dogan, Clayton Scott

Contextual bandits are a form of multi-armed bandit in which the agent has access to predictive side information (known as the context) for each arm at each time step, and have been used to model personalized news recommendation, ad placement, and other applications. In this work, we propose a multi-task learning framework for contextual bandit problems. Like multi-task learning in the batch setting, the goal is to leverage similarities in contexts for different arms so as to improve the agent's ability to predict rewards from contexts. We propose an upper confidence bound-based multi-task learning algorithm for contextual bandits, establish a corresponding regret bound, and interpret this bound to quantify the advantages of learning in the presence of high task (arm) similarity. We also describe an effective scheme for estimating task similarity from data, and demonstrate our algorithm's performance on several data sets.

MLNov 25, 2016
Distributed Optimization of Multi-Class SVMs

Maximilian Alber, Julian Zimmert, Urun Dogan et al.

Training of one-vs.-rest SVMs can be parallelized over the number of classes in a straight forward way. Given enough computational resources, one-vs.-rest SVMs can thus be trained on data involving a large number of classes. The same cannot be stated, however, for the so-called all-in-one SVMs, which require solving a quadratic program of size quadratically in the number of classes. We develop distributed algorithms for two all-in-one SVM formulations (Lee et al. and Weston and Watkins) that parallelize the computation evenly over the number of classes. This allows us to compare these models to one-vs.-rest SVMs on unprecedented scale. The results indicate superior accuracy on text classification data.