George Runger

LG
h-index6
5papers
1,156citations
Novelty55%
AI Score31

5 Papers

LGApr 16, 2023
Dynamic Exploration-Exploitation Trade-Off in Active Learning Regression with Bayesian Hierarchical Modeling

Upala Junaida Islam, Kamran Paynabar, George Runger et al.

Active learning provides a framework to adaptively query the most informative experiments towards learning an unknown black-box function. Various approaches of active learning have been proposed in the literature, however, they either focus on exploration or exploitation in the design space. Methods that do consider exploration-exploitation simultaneously employ fixed or ad-hoc measures to control the trade-off that may not be optimal. In this paper, we develop a Bayesian hierarchical approach, referred as BHEEM, to dynamically balance the exploration-exploitation trade-off as more data points are queried. To sample from the posterior distribution of the trade-off parameter, We subsequently formulate an approximate Bayesian computation approach based on the linear dependence of queried data in the feature space. Simulated and real-world examples show the proposed approach achieves at least 21% and 11% average improvement when compared to pure exploration and exploitation strategies respectively. More importantly, we note that by optimally balancing the trade-off between exploration and exploitation, BHEEM performs better or at least as well as either pure exploration or pure exploitation.

AIJan 29, 2024
Attention-based Reinforcement Learning for Combinatorial Optimization: Application to Job Shop Scheduling Problem

Jaejin Lee, Seho Kee, Mani Janakiram et al.

Job shop scheduling problems represent a significant and complex facet of combinatorial optimization problems, which have traditionally been addressed through either exact or approximate solution methodologies. However, the practical application of these solutions is often challenged due to the complexity of real-world problems. Even when utilizing an approximate solution approach, the time required to identify a near-optimal solution can be prohibitively extensive, and the solutions derived are generally not applicable to new problems. This study proposes an innovative attention-based reinforcement learning method specifically designed for the category of job shop scheduling problems. This method integrates a policy gradient reinforcement learning approach with a modified transformer architecture. A key finding of this research is the ability of our trained learners within the proposed method to be repurposed for larger-scale problems that were not part of the initial training set. Furthermore, empirical evidence demonstrates that our approach surpasses the results of recent studies and outperforms commonly implemented heuristic rules. This suggests that our method offers a promising avenue for future research and practical application in the field of job shop scheduling problems.

LGFeb 9, 2013
A Time Series Forest for Classification and Feature Extraction

Houtao Deng, George Runger, Eugene Tuv et al.

We propose a tree ensemble method, referred to as time series forest (TSF), for time series classification. TSF employs a combination of the entropy gain and a distance measure, referred to as the Entrance (entropy and distance) gain, for evaluating the splits. Experimental studies show that the Entrance gain criterion improves the accuracy of TSF. TSF randomly samples features at each tree node and has a computational complexity linear in the length of a time series and can be built using parallel computing techniques such as multi-core computing used here. The temporal importance curve is also proposed to capture the important temporal characteristics useful for classification. Experimental studies show that TSF using simple features such as mean, deviation and slope outperforms strong competitors such as one-nearest-neighbor classifiers with dynamic time warping, is computationally efficient, and can provide insights into the temporal characteristics.

LGSep 28, 2012
Gene selection with guided regularized random forest

Houtao Deng, George Runger

The regularized random forest (RRF) was recently proposed for feature selection by building only one ensemble. In RRF the features are evaluated on a part of the training data at each tree node. We derive an upper bound for the number of distinct Gini information gain values in a node, and show that many features can share the same information gain at a node with a small number of instances and a large number of features. Therefore, in a node with a small number of instances, RRF is likely to select a feature not strongly relevant. Here an enhanced RRF, referred to as the guided RRF (GRRF), is proposed. In GRRF, the importance scores from an ordinary random forest (RF) are used to guide the feature selection process in RRF. Experiments on 10 gene data sets show that the accuracy performance of GRRF is, in general, more robust than RRF when their parameters change. GRRF is computationally efficient, can select compact feature subsets, and has competitive accuracy performance, compared to RRF, varSelRF and LASSO logistic regression (with evaluations from an RF classifier). Also, RF applied to the features selected by RRF with the minimal regularization outperforms RF applied to all the features for most of the data sets considered here. Therefore, if accuracy is considered more important than the size of the feature subset, RRF with the minimal regularization may be considered. We use the accuracy performance of RF, a strong classifier, to evaluate feature selection methods, and illustrate that weak classifiers are less capable of capturing the information contained in a feature subset. Both RRF and GRRF were implemented in the "RRF" R package available at CRAN, the official R package archive.

LGJan 7, 2012
Feature Selection via Regularized Trees

Houtao Deng, George Runger

We propose a tree regularization framework, which enables many tree models to perform feature selection efficiently. The key idea of the regularization framework is to penalize selecting a new feature for splitting when its gain (e.g. information gain) is similar to the features used in previous splits. The regularization framework is applied on random forest and boosted trees here, and can be easily applied to other tree models. Experimental studies show that the regularized trees can select high-quality feature subsets with regard to both strong and weak classifiers. Because tree models can naturally deal with categorical and numerical variables, missing values, different scales between variables, interactions and nonlinearities etc., the tree regularization framework provides an effective and efficient feature selection solution for many practical problems.