LGAug 25, 2022
A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization MethodAnkur Sinha, Satender Gunwal, Shivam Kumar
Hyperparameter optimization in machine learning is often achieved using naive techniques that only lead to an approximate set of hyperparameters. Although techniques such as Bayesian optimization perform an intelligent search on a given domain of hyperparameters, it does not guarantee an optimal solution. A major drawback of most of these approaches is an exponential increase of their search domain with number of hyperparameters, increasing the computational cost and making the approaches slow. The hyperparameter optimization problem is inherently a bilevel optimization task, and some studies have attempted bilevel solution methodologies for solving this problem. However, these studies assume a unique set of model weights that minimize the training loss, which is generally violated by deep learning architectures. This paper discusses a gradient-based bilevel method addressing these drawbacks for solving the hyperparameter optimization problem. The proposed method can handle continuous hyperparameters for which we have chosen the regularization hyperparameter in our experiments. The method guarantees convergence to the set of optimal hyperparameters that this study has theoretically proven. The idea is based on approximating the lower-level optimal value function using Gaussian process regression. As a result, the bilevel problem is reduced to a single level constrained optimization task that is solved using the augmented Lagrangian method. We have performed an extensive computational study on the MNIST and CIFAR-10 datasets on multi-layer perceptron and LeNet architectures that confirms the efficiency of the proposed method. A comparative study against grid search, random search, Bayesian optimization, and HyberBand method on various hyperparameter problems shows that the proposed algorithm converges with lower computation and leads to models that generalize better on the testing set.
AIOct 24, 2025
AutoOpt: A Dataset and a Unified Framework for Automating Optimization Problem SolvingAnkur Sinha, Shobhit Arora, Dhaval Pujara
This study presents AutoOpt-11k, a unique image dataset of over 11,000 handwritten and printed mathematical optimization models corresponding to single-objective, multi-objective, multi-level, and stochastic optimization problems exhibiting various types of complexities such as non-linearity, non-convexity, non-differentiability, discontinuity, and high-dimensionality. The labels consist of the LaTeX representation for all the images and modeling language representation for a subset of images. The dataset is created by 25 experts following ethical data creation guidelines and verified in two-phases to avoid errors. Further, we develop AutoOpt framework, a machine learning based automated approach for solving optimization problems, where the user just needs to provide an image of the formulation and AutoOpt solves it efficiently without any further human intervention. AutoOpt framework consists of three Modules: (i) M1 (Image_to_Text)- a deep learning model performs the Mathematical Expression Recognition (MER) task to generate the LaTeX code corresponding to the optimization formulation in image; (ii) M2 (Text_to_Text)- a small-scale fine-tuned LLM generates the PYOMO script (optimization modeling language) from LaTeX code; (iii) M3 (Optimization)- a Bilevel Optimization based Decomposition (BOBD) method solves the optimization formulation described in the PYOMO script. We use AutoOpt-11k dataset for training and testing of deep learning models employed in AutoOpt. The deep learning model for MER task (M1) outperforms ChatGPT, Gemini and Nougat on BLEU score metric. BOBD method (M3), which is a hybrid approach, yields better results on complex test problems compared to common approaches, like interior-point algorithm and genetic algorithm.
IRFeb 4, 2025
FinBloom: Knowledge Grounding Large Language Model with Real-time Financial DataAnkur Sinha, Chaitanya Agarwal, Pekka Malo
Large language models (LLMs) excel at generating human-like responses but often struggle with interactive tasks that require access to real-time information. This limitation poses challenges in finance, where models must access up-to-date information, such as recent news or price movements, to support decision-making. To address this, we introduce Financial Agent, a knowledge-grounding approach for LLMs to handle financial queries using real-time text and tabular data. Our contributions are threefold: First, we develop a Financial Context Dataset of over 50,000 financial queries paired with the required context. Second, we train FinBloom 7B, a custom 7 billion parameter LLM, on 14 million financial news articles from Reuters and Deutsche Presse-Agentur, alongside 12 million Securities and Exchange Commission (SEC) filings. Third, we fine-tune FinBloom 7B using the Financial Context Dataset to serve as a Financial Agent. This agent generates relevant financial context, enabling efficient real-time data retrieval to answer user queries. By reducing latency and eliminating the need for users to manually provide accurate data, our approach significantly enhances the capability of LLMs to handle dynamic financial tasks. Our proposed approach makes real-time financial decisions, algorithmic trading and other related tasks streamlined, and is valuable in contexts with high-velocity data flows.
LGJun 30, 2024
A Linear Programming Enhanced Genetic Algorithm for Hyperparameter Tuning in Machine LearningAnkur Sinha, Paritosh Pankaj
In this paper, we formulate the hyperparameter tuning problem in machine learning as a bilevel program. The bilevel program is solved using a micro genetic algorithm that is enhanced with a linear program. While the genetic algorithm searches over discrete hyperparameters, the linear program enhancement allows hyper local search over continuous hyperparameters. The major contribution in this paper is the formulation of a linear program that supports fast search over continuous hyperparameters, and can be integrated with any hyperparameter search technique. It can also be applied directly on any trained machine learning or deep learning model for the purpose of fine-tuning. We test the performance of the proposed approach on two datasets, MNIST and CIFAR-10. Our results clearly demonstrate that using the linear program enhancement offers significant promise when incorporated with any population-based approach for hyperparameter tuning.
CLMay 20, 2023
SEntFiN 1.0: Entity-Aware Sentiment Analysis for Financial NewsAnkur Sinha, Satishwar Kedas, Rishu Kumar et al.
Fine-grained financial sentiment analysis on news headlines is a challenging task requiring human-annotated datasets to achieve high performance. Limited studies have tried to address the sentiment extraction task in a setting where multiple entities are present in a news headline. In an effort to further research in this area, we make publicly available SEntFiN 1.0, a human-annotated dataset of 10,753 news headlines with entity-sentiment annotations, of which 2,847 headlines contain multiple entities, often with conflicting sentiments. We augment our dataset with a database of over 1,000 financial entities and their various representations in news media amounting to over 5,000 phrases. We propose a framework that enables the extraction of entity-relevant sentiments using a feature-based approach rather than an expression-based approach. For sentiment extraction, we utilize 12 different learning schemes utilizing lexicon-based and pre-trained sentence representations and five classification approaches. Our experiments indicate that lexicon-based n-gram ensembles are above par with pre-trained word embedding schemes such as GloVe. Overall, RoBERTa and finBERT (domain-specific BERT) achieve the highest average accuracy of 94.29% and F1-score of 93.27%. Further, using over 210,000 entity-sentiment predictions, we validate the economic effect of sentiments on aggregate market movements over a long duration.
CLSep 9, 2020
Impact of News on the Commodity Market: Dataset and ResultsAnkur Sinha, Tanmay Khandait
Over the last few years, machine learning based methods have been applied to extract information from news flow in the financial domain. However, this information has mostly been in the form of the financial sentiments contained in the news headlines, primarily for the stock prices. In our current work, we propose that various other dimensions of information can be extracted from news headlines, which will be of interest to investors, policy-makers and other practitioners. We propose a framework that extracts information such as past movements and expected directionality in prices, asset comparison and other general information that the news is referring to. We apply this framework to the commodity "Gold" and train the machine learning models using a dataset of 11,412 human-annotated news headlines (released with this study), collected from the period 2000-2019. We experiment to validate the causal effect of news flow on gold prices and observe that the information produced from our framework significantly impacts the future gold price.
LGJul 21, 2020
A Gradient-based Bilevel Optimization Approach for Tuning Hyperparameters in Machine LearningAnkur Sinha, Tanmay Khandait, Raja Mohanty
Hyperparameter tuning is an active area of research in machine learning, where the aim is to identify the optimal hyperparameters that provide the best performance on the validation set. Hyperparameter tuning is often achieved using naive techniques, such as random search and grid search. However, most of these methods seldom lead to an optimal set of hyperparameters and often get very expensive. In this paper, we propose a bilevel solution method for solving the hyperparameter optimization problem that does not suffer from the drawbacks of the earlier studies. The proposed method is general and can be easily applied to any class of machine learning algorithms. The idea is based on the approximation of the lower level optimal value function mapping, which is an important mapping in bilevel optimization and helps in reducing the bilevel problem to a single level constrained optimization task. The single-level constrained optimization problem is solved using the augmented Lagrangian method. We discuss the theory behind the proposed algorithm and perform extensive computational study on two datasets that confirm the efficiency of the proposed method. We perform a comparative study against grid search, random search and Bayesian optimization techniques that shows that the proposed algorithm is multiple times faster on problems with one or two hyperparameters. The computational gain is expected to be significantly higher as the number of hyperparameters increase. Corresponding to a given hyperparameter most of the techniques in the literature often assume a unique optimal parameter set that minimizes loss on the training set. Such an assumption is often violated by deep learning architectures and the proposed method does not require any such assumption.
OCMay 17, 2017
A Review on Bilevel Optimization: From Classical to Evolutionary Approaches and ApplicationsAnkur Sinha, Pekka Malo, Kalyanmoy Deb
Bilevel optimization is defined as a mathematical program, where an optimization problem contains another optimization problem as a constraint. These problems have received significant attention from the mathematical programming community. Only limited work exists on bilevel problems using evolutionary computation techniques; however, recently there has been an increasing interest due to the proliferation of practical applications and the potential of evolutionary algorithms in tackling these problems. This paper provides a comprehensive review on bilevel optimization from the basic principles to solution strategies; both classical and evolutionary. A number of potential application problems are also discussed. To offer the readers insights on the prominent developments in the field of bilevel optimization, we have performed an automated text-analysis of an extended list of papers published on bilevel optimization to date. This paper should motivate evolutionary computation researchers to pay more attention to this practical yet challenging area.
OCAug 17, 2016
Optimal Management of Naturally Regenerating Uneven-aged ForestsAnkur Sinha, Janne Rämö, Pekka Malo et al.
A shift from even-aged forest management to uneven-aged management practices leads to a problem rather different from the existing straightforward practice that follows a rotation cycle of artificial regeneration, thinning of inferior trees and a clearcut. A lack of realistic models and methods suggesting how to manage uneven-aged stands in a way that is economically viable and ecologically sustainable creates difficulties in adopting this new management practice. To tackle this problem, we make a two-fold contribution in this paper. The first contribution is the proposal of an algorithm that is able to handle a realistic uneven-aged stand management model that is otherwise computationally tedious and intractable. The model considered in this paper is an empirically estimated size-structured ecological model for uneven-aged spruce forests. The second contribution is on the sensitivity analysis of the forest model with respect to a number of important parameters. The analysis provides us an insight into the behavior of the uneven-aged forest model.
CLJul 19, 2013
Good Debt or Bad Debt: Detecting Semantic Orientations in Economic TextsPekka Malo, Ankur Sinha, Pyry Takala et al.
The use of robo-readers to analyze news texts is an emerging technology trend in computational finance. In recent research, a substantial effort has been invested to develop sophisticated financial polarity-lexicons that can be used to investigate how financial sentiments relate to future company performance. However, based on experience from other fields, where sentiment analysis is commonly applied, it is well-known that the overall semantic orientation of a sentence may differ from the prior polarity of individual words. The objective of this article is to investigate how semantic orientations can be better detected in financial and economic news by accommodating the overall phrase-structure information and domain-specific use of language. Our three main contributions are: (1) establishment of a human-annotated finance phrase-bank, which can be used as benchmark for training and evaluating alternative models; (2) presentation of a technique to enhance financial lexicons with attributes that help to identify expected direction of events that affect overall sentiment; (3) development of a linearized phrase-structure model for detecting contextual semantic orientations in financial and economic news texts. The relevance of the newly added lexicon features and the benefit of using the proposed learning-algorithm are demonstrated in a comparative study against previously used general sentiment models as well as the popular word frequency models used in recent financial studies. The proposed framework is parsimonious and avoids the explosion in feature-space caused by the use of conventional n-gram features.
NEMar 15, 2013
Efficient Evolutionary Algorithm for Single-Objective Bilevel OptimizationAnkur Sinha, Pekka Malo, Kalyanmoy Deb
Bilevel optimization problems are a class of challenging optimization problems, which contain two levels of optimization tasks. In these problems, the optimal solutions to the lower level problem become possible feasible candidates to the upper level problem. Such a requirement makes the optimization problem difficult to solve, and has kept the researchers busy towards devising methodologies, which can efficiently handle the problem. Despite the efforts, there hardly exists any effective methodology, which is capable of handling a complex bilevel problem. In this paper, we introduce bilevel evolutionary algorithm based on quadratic approximations (BLEAQ) of optimal lower level variables with respect to the upper level variables. The approach is capable of handling bilevel problems with different kinds of complexities in relatively smaller number of function evaluations. Ideas from classical optimization have been hybridized with evolutionary methods to generate an efficient optimization algorithm for generic bilevel problems. The efficacy of the algorithm has been shown on two sets of test problems. The first set is a recently proposed SMD test set, which contains problems with controllable complexities, and the second set contains standard test problems collected from the literature. The proposed method has been evaluated against two benchmarks, and the performance gain is observed to be significant.
COMar 28, 2012
A Multi-objective Exploratory Procedure for Regression Model SelectionAnkur Sinha, Pekka Malo, Timo Kuosmanen
Variable selection is recognized as one of the most critical steps in statistical modeling. The problems encountered in engineering and social sciences are commonly characterized by over-abundance of explanatory variables, non-linearities and unknown interdependencies between the regressors. An added difficulty is that the analysts may have little or no prior knowledge on the relative importance of the variables. To provide a robust method for model selection, this paper introduces the Multi-objective Genetic Algorithm for Variable Selection (MOGA-VS) that provides the user with an optimal set of regression models for a given data-set. The algorithm considers the regression problem as a two objective task, and explores the Pareto-optimal (best subset) models by preferring those models over the other which have less number of regression coefficients and better goodness of fit. The model exploration can be performed based on in-sample or generalization error minimization. The model selection is proposed to be performed in two steps. First, we generate the frontier of Pareto-optimal regression models by eliminating the dominated models without any user intervention. Second, a decision making process is executed which allows the user to choose the most preferred model using visualisations and simple metrics. The method has been evaluated on a recently published real dataset on Communities and Crime within United States.