Vadim Sokolov

ML
h-index4
19papers
261citations
Novelty36%
AI Score46

19 Papers

SYJan 5, 2017
Platoon formation maximization through centralized routing and departure time coordination

Vadim Sokolov, Jeffrey Larson, Todd Munson et al.

Platooning allows vehicles to travel with small intervehicle distance in a coordinated fashion thanks to vehicle-to-vehicle connectivity. When applied at a larger scale, platooning will create significant opportunities for energy savings due to reduced aerodynamic drag, as well as increased road capacity and congestion reduction resulting from shorter vehicle headways. However, these potential savings are maximized if platooning-capable vehicles spend most of their travel time within platoons. Ad hoc platoon formation may not ensure a high rate of platoon driving. In this paper we consider the problem of central coordination of platooning-capable vehicles. By coordinating their routes and departure times, we can maximize the fuel savings afforded by platooning vehicles. The resulting problem is a combinatorial optimization problem that considers the platoon coordination and vehicle routing problems simultaneously. We demonstrate our methodology by evaluating the benefits of a coordinated solution and comparing it with the uncoordinated case when platoons form only in an ad hoc manner. We compare the coordinated and uncoordinated scenarios on a grid network with different assumptions about demand and the time vehicles are willing to wait.

AIJul 8, 2023
The Value of Chess Squares

Aditya Gupta, Shiva Maharaj, Nicholas Polson et al.

We propose a neural network-based approach to calculate the value of a chess square-piece combination. Our model takes a triplet (Color, Piece, Square) as an input and calculates a value that measures the advantage/disadvantage of having this piece on this square. Our methods build on recent advances in chess AI, and can accurately assess the worth of positions in a game of chess. The conventional approach assigns fixed values to pieces $(\symking=\infty, \symqueen=9, \symrook=5, \symbishop=3, \symknight=3, \sympawn=1)$. We enhance this analysis by introducing marginal valuations. We use deep Q-learning to estimate the parameters of our model. We demonstrate our method by examining the positioning of Knights and Bishops, and also provide valuable insights into the valuation of pawns. Finally, we conclude by suggesting potential avenues for future research.

MLAug 17, 2022
Quantum Bayesian Computation

Nick Polson, Vadim Sokolov, Jianeng Xu

Quantum Bayesian Computation (QBC) is an emerging field that levers the computational gains available from quantum computers to provide an exponential speed-up in Bayesian computation. Our paper adds to the literature in two ways. First, we show how von Neumann quantum measurement can be used to simulate machine learning algorithms such as Markov chain Monte Carlo (MCMC) and Deep Learning (DL) that are fundamental to Bayesian learning. Second, we describe data encoding methods needed to implement quantum machine learning including the counterparts to traditional feature extraction and kernel embeddings methods. Our goal then is to show how to apply quantum algorithms directly to statistical machine learning problems. On the theoretical side, we provide quantum versions of high dimensional regression, Gaussian processes (Q-GP) and stochastic gradient descent (Q-SGD). On the empirical side, we apply a Quantum FFT model to Chicago housing data. Finally, we conclude with directions for future research.

MLJan 14
Horseshoe Mixtures-of-Experts (HS-MoE)

Nick Polson, Vadim Sokolov

Horseshoe mixtures-of-experts (HS-MoE) models provide a Bayesian framework for sparse expert selection in mixture-of-experts architectures. We combine the horseshoe prior's adaptive global-local shrinkage with input-dependent gating, yielding data-adaptive sparsity in expert usage. Our primary methodological contribution is a particle learning algorithm for sequential inference, in which the filter is propagated forward in time while tracking only sufficient statistics. We also discuss how HS-MoE relates to modern mixture-of-experts layers in large language models, which are deployed under extreme sparsity constraints (e.g., activating a small number of experts per token out of a large pool).

LGFeb 24
Generative Bayesian Computation as a Scalable Alternative to Gaussian Process Surrogates

Nick Polson, Vadim Sokolov

Gaussian process (GP) surrogates are the default tool for emulating expensive computer experiments, but cubic cost, stationarity assumptions, and Gaussian predictive distributions limit their reach. We propose Generative Bayesian Computation (GBC) via Implicit Quantile Networks (IQNs) as a surrogate framework that targets all three limitations. GBC learns the full conditional quantile function from input--output pairs; at test time, a single forward pass per quantile level produces draws from the predictive distribution. Across fourteen benchmarks we compare GBC to four GP-based methods. GBC improves CRPS by 11--26\% on piecewise jump-process benchmarks, by 14\% on a ten-dimensional Friedman function, and scales linearly to 90,000 training points where dense-covariance GPs are infeasible. A boundary-augmented variant matches or outperforms Modular Jump GPs on two-dimensional jump datasets (up to 46\% CRPS improvement). In active learning, a randomized-prior IQN ensemble achieves nearly three times lower RMSE than deep GP active learning on Rocket LGBB. Overall, GBC records a favorable point estimate in 12 of 14 comparisons. GPs retain an edge on smooth surfaces where their smoothness prior provides effective regularization.

CODec 24, 2024
Generative Modeling: A Review

Maria Nareklishvili, Nick Polson, Vadim Sokolov

Generative methods (Gen-AI) are reviewed with a particular goal of solving tasks in machine learning and Bayesian inference. Generative models require one to simulate a large training dataset and to use deep neural networks to solve a supervised learning problem. To do this, we require high-dimensional regression methods and tools for dimensionality reduction (a.k.a. feature selection). The main advantage of Gen-AI methods is their ability to be model-free and to use deep neural networks to estimate conditional densities or posterior quintiles of interest. To illustrate generative methods , we analyze the well-known Ebola data set. Finally, we conclude with directions for future research.

COFeb 15
Fast Compute for ML Optimization

Nick Polson, Vadim Sokolov

We study optimization for losses that admit a variance-mean scale-mixture representation. Under this representation, each EM iteration is a weighted least squares update in which latent variables determine observation and parameter weights; these play roles analogous to Adam's second-moment scaling and AdamW's weight decay, but are derived from the model. The resulting Scale Mixture EM (SM-EM) algorithm removes user-specified learning-rate and momentum schedules. On synthetic ill-conditioned logistic regression benchmarks with $p \in \{20, \ldots, 500\}$, SM-EM with Nesterov acceleration attains up to $13\times$ lower final loss than Adam tuned by learning-rate grid search. For a 40-point regularization path, sharing sufficient statistics across penalty values yields a $10\times$ runtime reduction relative to the same tuned-Adam protocol. For the base (non-accelerated) algorithm, EM monotonicity guarantees nonincreasing objective values; adding Nesterov extrapolation trades this guarantee for faster empirical convergence.

MLJul 9, 2025
Bayesian Double Descent

Nick Polson, Vadim Sokolov

Double descent is a phenomenon of over-parameterized statistical models such as deep neural networks which have a re-descending property in their risk function. As the complexity of the model increases, risk exhibits a U-shaped region due to the traditional bias-variance trade-off, then as the number of parameters equals the number of observations and the model becomes one of interpolation where the risk can be unbounded and finally, in the over-parameterized region, it re-descends -- the double descent effect. Our goal is to show that this has a natural Bayesian interpretation. We also show that this is not in conflict with the traditional Occam's razor -- simpler models are preferred to complex ones, all else being equal. Our theoretical foundations use Bayesian model selection, the Dickey-Savage density ratio, and connect generalized ridge regression and global-local shrinkage methods with double descent. We illustrate our approach for high dimensional neural networks and provide detailed treatments of infinite Gaussian means models and non-parametric regression. Finally, we conclude with directions for future research.

SRMar 23, 2025
Generative AI for Validating Physics Laws

Maria Nareklishvili, Nicholas Polson, Vadim Sokolov

We present generative artificial intelligence (AI) to empirically validate fundamental laws of physics, focusing on the Stefan-Boltzmann law linking stellar temperature and luminosity. Our approach simulates counterfactual luminosities under hypothetical temperature regimes for each individual star and iteratively refines the temperature-luminosity relationship in a deep learning architecture. We use Gaia DR3 data and find that, on average, temperature's effect on luminosity increases with stellar radius and decreases with absolute magnitude, consistent with theoretical predictions. By framing physics laws as causal problems, our method offers a novel, data-driven approach to refine theoretical understanding and inform evidence-based policy and practice.

LGJan 1, 2025
Kolmogorov GAM Networks are all you need!

Sarah Polson, Vadim Sokolov

Kolmogorov GAM (K-GAM) networks are shown to be an efficient architecture for training and inference. They are an additive model with an embedding that is independent of the function of interest. They provide an alternative to the transformer architecture. They are the machine learning version of Kolmogorov's Superposition Theorem (KST) which provides an efficient representations of a multivariate function. Such representations have use in machine learning for encoding dictionaries (a.k.a. "look-up" tables). KST theory also provides a representation based on translates of the Köppen function. The goal of our paper is to interpret this representation in a machine learning context for applications in Artificial Intelligence (AI). Our architecture is equivalent to a topological embedding which is independent of the function together with an additive layer that uses a Generalized Additive Model (GAM). This provides a class of learning procedures with far fewer parameters than current deep learning algorithms. Implementation can be parallelizable which makes our algorithms computationally attractive. To illustrate our methodology, we use the Iris data from statistical learning. We also show that our additive model with non-linear embedding provides an alternative to transformer architectures which from a statistical viewpoint are kernel smoothers. Additive KAN models therefore provide a natural alternative to transformers. Finally, we conclude with directions for future research.

MLOct 10, 2023
Deep Learning: A Tutorial

Nick Polson, Vadim Sokolov

Our goal is to provide a review of deep learning methods which provide insight into structured high-dimensional data. Rather than using shallow additive architectures common to most statistical models, deep learning uses layers of semi-affine input transformations to provide a predictive rule. Applying these layers of transformations leads to a set of attributes (or, features) to which probabilistic statistical methods can be applied. Thus, the best of both worlds can be achieved: scalable prediction rules fortified with uncertainty quantification, where sparse regularization finds the features.

LGDec 14, 2021
Deep Generative Models for Vehicle Speed Trajectories

Farnaz Behnia, Dominik Karbowski, Vadim Sokolov

Generating realistic vehicle speed trajectories is a crucial component in evaluating vehicle fuel economy and in predictive control of self-driving cars. Traditional generative models rely on Markov chain methods and can produce accurate synthetic trajectories but are subject to the curse of dimensionality. They do not allow to include conditional input variables into the generation process. In this paper, we show how extensions to deep generative models allow accurate and scalable generation. Proposed architectures involve recurrent and feed-forward layers and are trained using adversarial techniques. Our models are shown to perform well on generating vehicle trajectories using a model trained on GPS data from Chicago metropolitan area.

MEOct 22, 2021
Merging Two Cultures: Deep and Statistical Learning

Anindya Bhadra, Jyotishka Datta, Nick Polson et al.

Merging the two cultures of deep and statistical learning provides insights into structured high-dimensional data. Traditional statistical modeling is still a dominant strategy for structured tabular data. Deep learning can be viewed through the lens of generalized linear models (GLMs) with composite link functions. Sufficient dimensionality reduction (SDR) and sparsity performs nonlinear feature engineering. We show that prediction, interpolation and uncertainty quantification can be achieved using probabilistic methods at the output layer of the model. Thus a general framework for machine learning arises that first generates nonlinear features (a.k.a factors) via sparse regularization and stochastic gradient optimisation and second uses a stochastic output layer for predictive uncertainty. Rather than using shallow additive architectures as in many statistical models, deep learning uses layers of semi affine input transformations to provide a predictive rule. Applying these layers of transformations leads to a set of attributes (a.k.a features) to which predictive statistical methods can be applied. Thus we achieve the best of both worlds: scalability and fast predictive rule construction together with uncertainty quantification. Sparse regularisation with un-supervised or supervised learning finds the features. We clarify the duality between shallow and wide models such as PCA, PPR, RRR and deep but skinny architectures such as autoencoders, MLPs, CNN, and LSTM. The connection with data transformations is of practical importance for finding good network architectures. By incorporating probabilistic components at the output level we allow for predictive uncertainty. For interpolation we use deep Gaussian process and ReLU trees for classification. We provide applications to regression, classification and interpolation. Finally, we conclude with directions for future research.

LGJun 11, 2019
Solving Large-Scale 0-1 Knapsack Problems and its Application to Point Cloud Resampling

Duanshun Li, Jing Liu, Noseong Park et al.

0-1 knapsack is of fundamental importance in computer science, business, operations research, etc. In this paper, we present a deep learning technique-based method to solve large-scale 0-1 knapsack problems where the number of products (items) is large and/or the values of products are not necessarily predetermined but decided by an external value assignment function during the optimization process. Our solution is greatly inspired by the method of Lagrange multiplier and some recent adoptions of game theory to deep learning. After formally defining our proposed method based on them, we develop an adaptive gradient ascent method to stabilize its optimization process. In our experiments, the presented method solves all the large-scale benchmark KP instances in a minute whereas existing methods show fluctuating runtime. We also show that our method can be used for other applications, including but not limited to the point cloud resampling.

LGAug 26, 2018
Deep Learning: Computational Aspects

Nicholas Polson, Vadim Sokolov

In this article we review computational aspects of Deep Learning (DL). Deep learning uses network architectures consisting of hierarchical layers of latent variables to construct predictors for high-dimensional input-output models. Training a deep learning architecture is computationally intensive, and efficient linear algebra libraries is the key for training and inference. Stochastic gradient descent (SGD) optimization and batch sampling are used to learn from massive data sets.

MLAug 16, 2018
Deep Learning for Energy Markets

Michael Polson, Vadim Sokolov

Deep Learning is applied to energy markets to predict extreme loads observed in energy grids. Forecasting energy loads and prices is challenging due to sharp peaks and troughs that arise due to supply and demand fluctuations from intraday system constraints. We propose deep spatio-temporal models and extreme value theory (EVT) to capture theses effects and in particular the tail behavior of load spikes. Deep LSTM architectures with ReLU and $\tanh$ activation functions can model trends and temporal dependencies while EVT captures highly volatile load spikes above a pre-specified threshold. To illustrate our methodology, we use hourly price and demand data from 4719 nodes of the PJM interconnection, and we construct a deep predictor. We show that DL-EVT outperforms traditional Fourier time series methods, both in-and out-of-sample, by capturing the observed nonlinearities in prices. Finally, we conclude with directions for future research.

MLJun 14, 2018
Deep Reinforcement Learning for Dynamic Urban Transportation Problems

Laura Schultz, Vadim Sokolov

We explore the use of deep learning and deep reinforcement learning for optimization problems in transportation. Many transportation system analysis tasks are formulated as an optimization problem - such as optimal control problems in intelligent transportation systems and long term urban planning. Often transportation models used to represent dynamics of a transportation system involve large data sets with complex input-output interactions and are difficult to use in the context of optimization. Use of deep learning metamodels can produce a lower dimensional representation of those relations and allow to implement optimization and reinforcement learning algorithms in an efficient manner. In particular, we develop deep learning models for calibrating transportation simulators and for reinforcement learning to solve the problem of optimal scheduling of travelers on the network.

AIOct 12, 2017
Clusters of Driving Behavior from Observational Smartphone Data

Josh Warren, Jeff Lipkowitz, Vadim Sokolov

Understanding driving behaviors is essential for improving safety and mobility of our transportation systems. Data is usually collected via simulator-based studies or naturalistic driving studies. Those techniques allow for understanding relations between demographics, road conditions and safety. On the other hand, they are very costly and time consuming. Thanks to the ubiquity of smartphones, we have an opportunity to substantially complement more traditional data collection techniques with data extracted from phone sensors, such as GPS, accelerometer gyroscope and camera. We developed statistical models that provided insight into driver behavior in the San Francisco metro area based on tens of thousands of driver logs. We used novel data sources to support our work. We used cell phone sensor data drawn from five hundred drivers in San Francisco to understand the speed of traffic across the city as well as the maneuvers of drivers in different areas. Specifically, we clustered drivers based on their driving behavior. We looked at driver norms by street and flagged driving behaviors that deviated from the norm.

MLJun 1, 2017
Deep Learning: A Bayesian Perspective

Nicholas Polson, Vadim Sokolov

Deep learning is a form of machine learning for nonlinear high dimensional pattern matching and prediction. By taking a Bayesian probabilistic perspective, we provide a number of insights into more efficient algorithms for optimisation and hyper-parameter tuning. Traditional high-dimensional data reduction techniques, such as principal component analysis (PCA), partial least squares (PLS), reduced rank regression (RRR), projection pursuit regression (PPR) are all shown to be shallow learners. Their deep learning counterparts exploit multiple deep layers of data reduction which provide predictive performance gains. Stochastic gradient descent (SGD) training optimisation and Dropout (DO) regularization provide estimation and variable selection. Bayesian regularization is central to finding weights and connections in networks to optimize the predictive bias-variance trade-off. To illustrate our methodology, we provide an analysis of international bookings on Airbnb. Finally, we conclude with directions for future research.