MLApr 7, 2025
SurvSurf: a partially monotonic neural network for first-hitting time prediction of intermittently observed discrete and continuous sequential eventsYichen Kelly Chen, Sören Dittmer, Kinga Bernatowicz et al.
We propose a neural-network based survival model (SurvSurf) specifically designed for direct and simultaneous probabilistic prediction of the first hitting time of sequential events from baseline. Unlike existing models, SurvSurf is theoretically guaranteed to never violate the monotonic relationship between the cumulative incidence functions of sequential events, while allowing nonlinear influence from predictors. It also incorporates implicit truths for unobserved intermediate events in model fitting, and supports both discrete and continuous time and events. We also identified a variant of the Integrated Brier Score (IBS) that showed robust correlation with the mean squared error (MSE) between the true and predicted probabilities by accounting for implied truths about the missing intermediate events. We demonstrated the superiority of SurvSurf compared to modern and traditional predictive survival models in two simulated datasets and two real-world datasets, using MSE, the more robust IBS and by measuring the extent of monotonicity violation.
LGJun 16, 2017
Hidden Talents of the Variational AutoencoderBin Dai, Yu Wang, John Aston et al.
Variational autoencoders (VAE) represent a popular, flexible form of deep generative model that can be stochastically fit to samples from a given random process using an information-theoretic variational bound on the true underlying distribution. Once so-obtained, the model can be putatively used to generate new samples from this distribution, or to provide a low-dimensional latent representation of existing samples. While quite effective in numerous application domains, certain important mechanisms which govern the behavior of the VAE are obfuscated by the intractable integrals and resulting stochastic approximations involved. Moreover, as a highly non-convex model, it remains unclear exactly how minima of the underlying energy relate to original design purposes. We attempt to better quantify these issues by analyzing a series of tractable special cases of increasing complexity. In doing so, we unveil interesting connections with more traditional dimensionality reduction models, as well as an intrinsic yet underappreciated propensity for robustly dismissing sparse outliers when estimating latent manifolds. With respect to the latter, we demonstrate that the VAE can be viewed as the natural evolution of recent robust PCA models, capable of learning nonlinear manifolds of unknown dimension obscured by gross corruptions.
COJun 19, 2014
Divide-and-Conquer with Sequential Monte CarloFredrik Lindsten, Adam M. Johansen, Christian A. Naesseth et al.
We propose a novel class of Sequential Monte Carlo (SMC) algorithms, appropriate for inference in probabilistic graphical models. This class of algorithms adopts a divide-and-conquer approach based upon an auxiliary tree-structured decomposition of the model of interest, turning the overall inferential task into a collection of recursively solved sub-problems. The proposed method is applicable to a broad class of probabilistic graphical models, including models with loops. Unlike a standard SMC sampler, the proposed Divide-and-Conquer SMC employs multiple independent populations of weighted particles, which are resampled, merged, and propagated as the method progresses. We illustrate empirically that this approach can outperform standard methods in terms of the accuracy of the posterior expectation and marginal likelihood approximations. Divide-and-Conquer SMC also opens up novel parallel implementation options and the possibility of concentrating the computational effort on the most challenging sub-problems. We demonstrate its performance on a Markov random field and on a hierarchical logistic regression problem.
PESep 4, 2012
Monotonicity of Fitness Landscapes and Mutation Rate ControlRoman V. Belavkin, Alastair Channon, Elizabeth Aston et al.
A common view in evolutionary biology is that mutation rates are minimised. However, studies in combinatorial optimisation and search have shown a clear advantage of using variable mutation rates as a control parameter to optimise the performance of evolutionary algorithms. Much biological theory in this area is based on Ronald Fisher's work, who used Euclidean geometry to study the relation between mutation size and expected fitness of the offspring in infinite phenotypic spaces. Here we reconsider this theory based on the alternative geometry of discrete and finite spaces of DNA sequences. First, we consider the geometric case of fitness being isomorphic to distance from an optimum, and show how problems of optimal mutation rate control can be solved exactly or approximately depending on additional constraints of the problem. Then we consider the general case of fitness communicating only partial information about the distance. We define weak monotonicity of fitness landscapes and prove that this property holds in all landscapes that are continuous and open at the optimum. This theoretical result motivates our hypothesis that optimal mutation rate functions in such landscapes will increase when fitness decreases in some neighbourhood of an optimum, resembling the control functions derived in the geometric case. We test this hypothesis experimentally by analysing approximately optimal mutation rate control functions in 115 complete landscapes of binding scores between DNA sequences and transcription factors. Our findings support the hypothesis and find that the increase of mutation rate is more rapid in landscapes that are less monotonic (more rugged). We discuss the relevance of these findings to living organisms.