LGJun 1, 2022
Positive Unlabeled Contrastive LearningAnish Acharya, Sujay Sanghavi, Li Jing et al. · openai
Self-supervised pretraining on unlabeled data followed by supervised fine-tuning on labeled data is a popular paradigm for learning from limited labeled examples. We extend this paradigm to the classical positive unlabeled (PU) setting, where the task is to learn a binary classifier given only a few labeled positive samples, and (often) a large amount of unlabeled samples (which could be positive or negative). We first propose a simple extension of standard infoNCE family of contrastive losses, to the PU setting; and show that this learns superior representations, as compared to existing unsupervised and supervised approaches. We then develop a simple methodology to pseudo-label the unlabeled samples using a new PU-specific clustering scheme; these pseudo-labels can then be used to train the final (positive vs. negative) classifier. Our method handily outperforms state-of-the-art PU methods over several standard PU benchmark datasets, while not requiring a-priori knowledge of any class prior (which is a common assumption in other PU methods). We also provide a simple theoretical analysis that motivates our methods.
SYNov 17, 2014
Extending the Concept of Analog Butterworth Filter for Fractional Order SystemsAnish Acharya, Saptarshi Das, Indranil Pan et al.
This paper proposes the design of Fractional Order (FO) Butterworth filter in complex w-plane (w=sq; q being any real number) considering the presence of under-damped, hyper-damped, ultra-damped poles. This is the first attempt to design such fractional Butterworth filters in complex w-plane instead of complex s-plane, as conventionally done for integer order filters. Firstly, the concept of fractional derivatives and w-plane stability of linear fractional order systems are discussed. Detailed mathematical formulation for the design of fractional Butterworth-like filter (FBWF) in w-plane is then presented. Simulation examples are given along with a practical example to design the FO Butterworth filter with given specifications in frequency domain to show the practicability of the proposed formulation.
CDNov 29, 2016
Simulation studies on the design of optimum PID controllers to suppress chaotic oscillations in a family of Lorenz-like multi-wing attractorsSaptarshi Das, Anish Acharya, Indranil Pan
Multi-wing chaotic attractors are highly complex nonlinear dynamical systems with higher number of index-2 equilibrium points. Due to the presence of several equilibrium points, randomness and hence the complexity of the state time series for these multi-wing chaotic systems is much higher than that of the conventional double-wing chaotic attractors. A real-coded Genetic Algorithm (GA) based global optimization framework has been adopted in this paper as a common template for designing optimum Proportional-Integral-Derivative (PID) controllers in order to control the state trajectories of four different multi-wing chaotic systems among the Lorenz family viz. Lu system, Chen system, Rucklidge (or Shimizu Morioka) system and Sprott-1 system. Robustness of the control scheme for different initial conditions of the multi-wing chaotic systems has also been shown.
CLMar 29, 2022
LDKP: A Dataset for Identifying Keyphrases from Long Scientific DocumentsDebanjan Mahata, Navneet Agarwal, Dibya Gautam et al.
Identifying keyphrases (KPs) from text documents is a fundamental task in natural language processing and information retrieval. Vast majority of the benchmark datasets for this task are from the scientific domain containing only the document title and abstract information. This limits keyphrase extraction (KPE) and keyphrase generation (KPG) algorithms to identify keyphrases from human-written summaries that are often very short (approx 8 sentences). This presents three challenges for real-world applications: human-written summaries are unavailable for most documents, the documents are almost always long, and a high percentage of KPs are directly found beyond the limited context of title and abstract. Therefore, we release two extensive corpora mapping KPs of ~1.3M and ~100K scientific articles with their fully extracted text and additional metadata including publication venue, year, author, field of study, and citations for facilitating research on this real-world problem.
SYOct 30, 2012
Optimized Quality Factor of Fractional Order Analog Filters with Band-Pass and Band-Stop CharacteristicsAnindya Pakhira, Saptarshi Das, Anish Acharya et al.
Fractional order (FO) filters have been investigated in this paper, with band-pass (BP) and band-stop (BS) characteristics, which can not be achieved with conventional integer order filters with orders lesser then two. The quality factors for symmetric and asymmetric magnitude response have been optimized using real coded Genetic Algorithm (GA) for a user specified center frequency. Parametric influence of the FO filters on the magnitude response is also illustrated with credible numerical simulations.
SYDec 18, 2012
Identification of Nonlinear Systems From the Knowledge Around Different Operating Conditions: A Feed-Forward Multi-Layer ANN Based ApproachSayan Saha, Saptarshi Das, Anish Acharya et al.
The paper investigates nonlinear system identification using system output data at various linearized operating points. A feed-forward multi-layer Artificial Neural Network (ANN) based approach is used for this purpose and tested for two target applications i.e. nuclear reactor power level monitoring and an AC servo position control system. Various configurations of ANN using different activation functions, number of hidden layers and neurons in each layer are trained and tested to find out the best configuration. The training is carried out multiple times to check for consistency and the mean and standard deviation of the root mean square errors (RMSE) are reported for each configuration.
SYDec 31, 2012
Stability Analysis Of Delayed System Using Bodes IntegralAnish Acharya, Debatri Mitra, Kaushik Halder
The PID controller parameters can be adjusted in such a manner that it gives the desired frequency response and the results are found using the Bodes integral formula in order to adjust the slope of the nyquist curve in a desired manner. The same idea is applied for plants with time delay . The same has also been done in a new approach . The delay term is approximated as a transfer function using Pade approximation and then the Bode integral is used to determine the controller parameters. Both the methodologies are demonstrated with MATLAB simulation of representative plants and accompanying PID controllers. A proper comparison of the two methodologies is also done. The PID controller parameters are also tuned using a real coded Genetic Algorithm (GA) and a proper comparison is done between the three methods.
ITJun 5, 2021Code
Neural Distributed Source CodingJay Whang, Alliot Nagle, Anish Acharya et al.
Distributed source coding (DSC) is the task of encoding an input in the absence of correlated side information that is only available to the decoder. Remarkably, Slepian and Wolf showed in 1973 that an encoder without access to the side information can asymptotically achieve the same compression rate as when the side information is available to it. While there is vast prior work on this topic, practical DSC has been limited to synthetic datasets and specific correlation structures. Here we present a framework for lossy DSC that is agnostic to the correlation structure and can scale to high dimensions. Rather than relying on hand-crafted source modeling, our method utilizes a conditional Vector-Quantized Variational Autoencoder (VQ-VAE) to learn the distributed encoder and decoder. We evaluate our method on multiple datasets and show that our method can handle complex correlations and achieves state-of-the-art PSNR. Our code is made available at https://github.com/acnagle/neural-dsc.
LGApr 1, 2025
Geometric Median Matching for Robust k-Subset Selection from Noisy DataAnish Acharya, Sujay Sanghavi, Alexandros G. Dimakis et al.
Data pruning -- the combinatorial task of selecting a small and representative subset from a large dataset, is crucial for mitigating the enormous computational costs associated with training data-hungry modern deep learning models at scale. Since large scale data collections are invariably noisy, developing data pruning strategies that remain robust even in the presence of corruption is critical in practice. However, existing data pruning methods often fail under high corruption rates due to their reliance on empirical mean estimation, which is highly sensitive to outliers. In response, we propose Geometric Median (GM) Matching, a novel k-subset selection strategy that leverages Geometric Median -- a robust estimator with an optimal breakdown point of 1/2; to enhance resilience against noisy data. Our method iteratively selects a k-subset such that the mean of the subset approximates the GM of the (potentially) noisy dataset, ensuring robustness even under arbitrary corruption. We provide theoretical guarantees, showing that GM Matching enjoys an improved O(1/k) convergence rate -- a quadratic improvement over random sampling, even under arbitrary corruption. Extensive experiments across image classification and image generation tasks demonstrate that GM Matching consistently outperforms existing pruning approaches, particularly in high-corruption settings and at high pruning rates; making it a strong baseline for robust data pruning.
LGJun 25, 2024
Geometric Median (GM) Matching for Robust Data PruningAnish Acharya, Inderjit S Dhillon, Sujay Sanghavi
Large-scale data collections in the wild, are invariably noisy. Thus developing data pruning strategies that remain robust even in the presence of corruption is critical in practice. In this work, we propose Geometric Median ($\gm$) Matching -- a herding style greedy algorithm that yields a $k$-subset such that the mean of the subset approximates the geometric median of the (potentially) noisy dataset. Theoretically, we show that $\gm$ Matching enjoys an improved $\gO(1/k)$ scaling over $\gO(1/\sqrt{k})$ scaling of uniform sampling; while achieving {\bf optimal breakdown point} of {\bf 1/2} even under {\bf arbitrary} corruption. Extensive experiments across several popular deep learning benchmarks indicate that $\gm$ Matching consistently improves over prior state-of-the-art; the gains become more profound at high rates of corruption and aggressive pruning rates; making $\gm$ Matching a strong baseline for future research in robust data pruning.
LGFeb 8, 2024
Understanding Contrastive Representation Learning from Positive Unlabeled (PU) DataAnish Acharya, Li Jing, Bhargav Bhushanam et al.
Pretext Invariant Representation Learning (PIRL) followed by Supervised Fine-Tuning (SFT) has become a standard paradigm for learning with limited labels. We extend this approach to the Positive Unlabeled (PU) setting, where only a small set of labeled positives and a large unlabeled pool -- containing both positives and negatives are available. We study this problem under two regimes: (i) without access to the class prior, and (ii) when the prior is known or can be estimated. We introduce Positive Unlabeled Contrastive Learning (puCL), an unbiased and variance reducing contrastive objective that integrates weak supervision from labeled positives judiciously into the contrastive loss. When the class prior is known, we propose Positive Unlabeled InfoNCE (puNCE), a prior-aware extension that re-weights unlabeled samples as soft positive negative mixtures. For downstream classification, we develop a pseudo-labeling algorithm that leverages the structure of the learned embedding space via PU aware clustering. Our framework is supported by theory; offering bias-variance analysis, convergence insights, and generalization guarantees via augmentation concentration; and validated empirically across standard PU benchmarks, where it consistently outperforms existing methods, particularly in low-supervision regimes.
CLJul 7, 2021
DISCO : efficient unsupervised decoding for discrete natural language problems via convex relaxationAnish Acharya, Rudrajit Das
In this paper we study test time decoding; an ubiquitous step in almost all sequential text generation task spanning across a wide array of natural language processing (NLP) problems. Our main contribution is to develop a continuous relaxation framework for the combinatorial NP-hard decoding problem and propose Disco - an efficient algorithm based on standard first order gradient based. We provide tight analysis and show that our proposed algorithm linearly converges to within $ε$ neighborhood of the optima. Finally, we perform preliminary experiments on the task of adversarial text generation and show superior performance of Disco over several popular decoding approaches.
LGJun 16, 2021
Robust Training in High Dimensions via Block Coordinate Geometric Median DescentAnish Acharya, Abolfazl Hashemi, Prateek Jain et al.
Geometric median (\textsc{Gm}) is a classical method in statistics for achieving a robust estimation of the uncorrupted data; under gross corruption, it achieves the optimal breakdown point of 0.5. However, its computational complexity makes it infeasible for robustifying stochastic gradient descent (SGD) for high-dimensional optimization problems. In this paper, we show that by applying \textsc{Gm} to only a judiciously chosen block of coordinates at a time and using a memory mechanism, one can retain the breakdown point of 0.5 for smooth non-convex problems, with non-asymptotic convergence rates comparable to the SGD with \textsc{Gm}.
CLApr 19, 2021
Alexa Conversations: An Extensible Data-driven Approach for Building Task-oriented Dialogue SystemsAnish Acharya, Suranjit Adhikari, Sanchit Agarwal et al.
Traditional goal-oriented dialogue systems rely on various components such as natural language understanding, dialogue state tracking, policy learning and response generation. Training each component requires annotations which are hard to obtain for every new domain, limiting scalability of such systems. Similarly, rule-based dialogue systems require extensive writing and maintenance of rules and do not scale either. End-to-End dialogue systems, on the other hand, do not require module-specific annotations but need a large amount of data for training. To overcome these problems, in this demo, we present Alexa Conversations, a new approach for building goal-oriented dialogue systems that is scalable, extensible as well as data efficient. The components of this system are trained in a data-driven manner, but instead of collecting annotated conversations for training, we generate them using a novel dialogue simulator based on a few seed dialogues and specifications of APIs and entities provided by the developer. Our approach provides out-of-the-box support for natural conversational phenomena like entity sharing across turns or users changing their mind during conversation without requiring developers to provide any such dialogue flows. We exemplify our approach using a simple pizza ordering task and showcase its value in reducing the developer burden for creating a robust experience. Finally, we evaluate our system using a typical movie ticket booking task and show that the dialogue simulator is an essential component of the system that leads to over $50\%$ improvement in turn-level action signature prediction accuracy.
CLApr 17, 2021
GupShup: An Annotated Corpus for Abstractive Summarization of Open-Domain Code-Switched ConversationsLaiba Mehnaz, Debanjan Mahata, Rakesh Gosangi et al.
Code-switching is the communication phenomenon where speakers switch between different languages during a conversation. With the widespread adoption of conversational agents and chat platforms, code-switching has become an integral part of written conversations in many multi-lingual communities worldwide. This makes it essential to develop techniques for summarizing and understanding these conversations. Towards this objective, we introduce abstractive summarization of Hindi-English code-switched conversations and develop the first code-switched conversation summarization dataset - GupShup, which contains over 6,831 conversations in Hindi-English and their corresponding human-annotated summaries in English and Hindi-English. We present a detailed account of the entire data collection and annotation processes. We analyze the dataset using various code-switching statistics. We train state-of-the-art abstractive summarization models and report their performances using both automated metrics and human evaluation. Our results show that multi-lingual mBART and multi-view seq2seq models obtain the best performances on the new dataset
MLDec 7, 2020
Faster Non-Convex Federated Learning via Global and Local MomentumRudrajit Das, Anish Acharya, Abolfazl Hashemi et al.
We propose \texttt{FedGLOMO}, a novel federated learning (FL) algorithm with an iteration complexity of $\mathcal{O}(ε^{-1.5})$ to converge to an $ε$-stationary point (i.e., $\mathbb{E}[\|\nabla f(\bm{x})\|^2] \leq ε$) for smooth non-convex functions -- under arbitrary client heterogeneity and compressed communication -- compared to the $\mathcal{O}(ε^{-2})$ complexity of most prior works. Our key algorithmic idea that enables achieving this improved complexity is based on the observation that the convergence in FL is hampered by two sources of high variance: (i) the global server aggregation step with multiple local updates, exacerbated by client heterogeneity, and (ii) the noise of the local client-level stochastic gradients. By modeling the server aggregation step as a generalized gradient-type update, we propose a variance-reducing momentum-based global update at the server, which when applied in conjunction with variance-reduced local updates at the clients, enables \texttt{FedGLOMO} to enjoy an improved convergence rate. Moreover, we derive our results under a novel and more realistic client-heterogeneity assumption which we verify empirically -- unlike prior assumptions that are hard to verify. Our experiments illustrate the intrinsic variance reduction effect of \texttt{FedGLOMO}, which implicitly suppresses client-drift in heterogeneous data distribution settings and promotes communication efficiency.
LGNov 20, 2020
On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized OptimizationAbolfazl Hashemi, Anish Acharya, Rudrajit Das et al.
In decentralized optimization, it is common algorithmic practice to have nodes interleave (local) gradient descent iterations with gossip (i.e. averaging over the network) steps. Motivated by the training of large-scale machine learning models, it is also increasingly common to require that messages be {\em lossy compressed} versions of the local parameters. In this paper, we show that, in such compressed decentralized optimization settings, there are benefits to having {\em multiple} gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for e.g. by means of reducing the precision of compressed information. In particular, we show that having $O(\log\frac{1}ε)$ gradient iterations {with constant step size} - and $O(\log\frac{1}ε)$ gossip steps between every pair of these iterations - enables convergence to within $ε$ of the optimal value for smooth non-convex objectives satisfying Polyak-Łojasiewicz condition. This result also holds for smooth strongly convex objectives. To our knowledge, this is the first work that derives convergence results for nonconvex optimization under arbitrary communication compression.
IRDec 19, 2018
Detecting the Trend in Musical Taste over the Decade -- A Novel Feature Extraction Algorithm to Classify Musical Content with Simple FeaturesAnish Acharya
This work proposes a novel feature selection algorithm to classify Songs into different groups. Classification of musical content is often a non-trivial job and still relatively less explored area. The main idea conveyed in this article is to come up with a new feature selection scheme that does the classification job elegantly and with high accuracy but with simpler but wisely chosen small number of features thus being less prone to over-fitting. This uses a very basic general idea about the structure of the audio signal which is generally in the shape of a trapezium. So, using this general idea of the Musical Community we propose three frames to be considered and analyzed for feature extraction for each of the audio signal -- opening, stanzas and closing -- and it has been established with the help of a lot of experiments that this scheme leads to much efficient classification with less complex features in a low dimensional feature space thus is also a computationally less expensive method. Step by step analysis of feature extraction, feature ranking, dimensionality reduction using PCA has been carried in this article. Sequential Forward selection (SFS) algorithm is used to explore the most significant features both with the raw Fisher Discriminant Ratio (FDR) and also with the significant eigen-values after PCA. Also during classification extensive validation and cross validation has been done in a monte-carlo manner to ensure validity of the claims.
LGNov 1, 2018
Online Embedding Compression for Text Classification using Low Rank Matrix FactorizationAnish Acharya, Rahul Goel, Angeliki Metallinou et al.
Deep learning models have become state of the art for natural language processing (NLP) tasks, however deploying these models in production system poses significant memory constraints. Existing compression methods are either lossy or introduce significant latency. We propose a compression method that leverages low rank matrix factorization during training,to compress the word embedding layer which represents the size bottleneck for most NLP models. Our models are trained, compressed and then further re-trained on the downstream task to recover accuracy while maintaining the reduced size. Empirically, we show that the proposed method can achieve 90% compression with minimal impact in accuracy for sentence classification tasks, and outperforms alternative methods like fixed-point quantization or offline word embedding compression. We also analyze the inference time and storage space for our method through FLOP calculations, showing that we can compress DNN models by a configurable ratio and regain accuracy loss without introducing additional latency compared to fixed point quantization. Finally, we introduce a novel learning rate schedule, the Cyclically Annealed Learning Rate (CALR), which we empirically demonstrate to outperform other popular adaptive learning rate algorithms on a sentence classification benchmark.
CVMay 7, 2016
On Image segmentation using Fractional Gradients-Learning Model Parameters using Approximate Marginal InferenceAnish Acharya, Uddipan Mukherjee, Charless Fowlkes
Estimates of image gradients play a ubiquitous role in image segmentation and classification problems since gradients directly relate to the boundaries or the edges of a scene. This paper proposes an unified approach to gradient estimation based on fractional calculus that is computationally cheap and readily applicable to any existing algorithm that relies on image gradients. We show experiments on edge detection and image segmentation on the Stanford Backgrounds Dataset where these improved local gradients outperforms state of the art, achieving a performance of 79.2% average accuracy.
CYDec 16, 2014
Are We Ready for Driver-less Vehicles? Security vs. Privacy- A Social PerspectiveAnish Acharya
At this moment Autonomous cars are probably the biggest and most talked about technology in the Robotics Research Community. In spite of great technological advances over past few years a full edged autonomous car is still far from reality. This article talks about the existing system and discusses the possibility of a Computer Vision enabled driving being superior than the LiDar based system. A detailed overview of privacy violations that might arise from autonomous driving has been discussed in detail both from a technical as well as legal perspective. It has been proved through evidence and arguments that efficient and accurate estimation and efficient solution of the constraint satisfaction problem addressed in the case of autonomous cars are negatively correlated with the preserving the privacy of the user. It is a very difficult trade-off since both are very important aspects and has to be taken into account. The fact that one cannot compromise with the safety issues of the car makes it inevitable to run into serious privacy concerns that might have adverse social and political effects.
CVJun 27, 2014
Template Matching based Object Detection Using HOG Feature PyramidAnish Acharya
This article provides a step by step development of designing a Object Detection scheme using the HOG based Feature Pyramid aligned with the concept of Template Matching.