SYMay 6, 2018
Multidimensional Realization Theory and Polynomial System SolvingPhilippe Dreesen, Kim Batselier, Bart De Moor
Multidimensional systems are becoming increasingly important as they provide a promising tool for estimation, simulation and control, while going beyond the traditional setting of one-dimensional systems. The analysis of multidimensional systems is linked to multivariate polynomials, and is therefore more difficult than the well-known analysis of one-dimensional systems, which is linked to univariate polynomials. In the current paper we relate the realization theory for overdetermined autonomous multidimensional systems to the problem of solving a system of polynomial equations. We show that basic notions of linear algebra suffice to analyze and solve the problem. The difference equations are associated with a Macaulay matrix formulation, and it is shown that the null space of the Macaulay matrix is a multidimensional observability matrix. Application of the classical shift trick from realization theory allows for the computation of the corresponding system matrices in a multidimensional state-space setting. This reduces the task of solving a system of polynomial equations to computing an eigenvalue decomposition. We study the occurrence of multiple solutions, as well as the existence and analysis of solutions at infinity, which allow for an interpretation in terms of multidimensional descriptor systems.
LGJan 24, 2023
Multi-view Kernel PCA for Time series ForecastingArun Pandey, Hannes De Meulemeester, Bart De Moor et al.
In this paper, we propose a kernel principal component analysis model for multi-variate time series forecasting, where the training and prediction schemes are derived from the multi-view formulation of Restricted Kernel Machines. The training problem is simply an eigenvalue decomposition of the summation of two kernel matrices corresponding to the views of the input and output data. When a linear kernel is used for the output view, it is shown that the forecasting equation takes the form of kernel ridge regression. When that kernel is non-linear, a pre-image problem has to be solved to forecast a point in the input space. We evaluate the model on several standard time series datasets, perform ablation studies, benchmark with closely related models and discuss its results.
LGJun 29, 2023
Length of Stay prediction for Hospital Management using Domain AdaptationLyse Naomi Wamba Momo, Nyalleng Moorosi, Elaine O. Nsoesie et al.
Inpatient length of stay (LoS) is an important managerial metric which if known in advance can be used to efficiently plan admissions, allocate resources and improve care. Using historical patient data and machine learning techniques, LoS prediction models can be developed. Ethically, these models can not be used for patient discharge in lieu of unit heads but are of utmost necessity for hospital management systems in charge of effective hospital planning. Therefore, the design of the prediction system should be adapted to work in a true hospital setting. In this study, we predict early hospital LoS at the granular level of admission units by applying domain adaptation to leverage information learned from a potential source domain. Time-varying data from 110,079 and 60,492 patient stays to 8 and 9 intensive care units were respectively extracted from eICU-CRD and MIMIC-IV. These were fed into a Long-Short Term Memory and a Fully connected network to train a source domain model, the weights of which were transferred either partially or fully to initiate training in target domains. Shapley Additive exPlanations (SHAP) algorithms were used to study the effect of weight transfer on model explanability. Compared to the benchmark, the proposed weight transfer model showed statistically significant gains in prediction accuracy (between 1% and 5%) as well as computation time (up to 2hrs) for some target domains. The proposed method thus provides an adapted clinical decision support system for hospital management that can ease processes of data access via ethical committee, computation infrastructures and time.
LGApr 28, 2023
Client Recruitment for Federated Learning in ICU Length of Stay PredictionVincent Scheltjens, Lyse Naomi Wamba Momo, Wouter Verbeke et al.
Machine and deep learning methods for medical and healthcare applications have shown significant progress and performance improvement in recent years. These methods require vast amounts of training data which are available in the medical sector, albeit decentralized. Medical institutions generate vast amounts of data for which sharing and centralizing remains a challenge as the result of data and privacy regulations. The federated learning technique is well-suited to tackle these challenges. However, federated learning comes with a new set of open problems related to communication overhead, efficient parameter aggregation, client selection strategies and more. In this work, we address the step prior to the initiation of a federated network for model training, client recruitment. By intelligently recruiting clients, communication overhead and overall cost of training can be reduced without sacrificing predictive performance. Client recruitment aims at pre-excluding potential clients from partaking in the federation based on a set of criteria indicative of their eventual contributions to the federation. In this work, we propose a client recruitment approach using only the output distribution and sample size at the client site. We show how a subset of clients can be recruited without sacrificing model performance whilst, at the same time, significantly improving computation time. By applying the recruitment approach to the training of federated models for accurate patient Length of Stay prediction using data from 189 Intensive Care Units, we show how the models trained in federations made up from recruited clients significantly outperform federated models trained with the standard procedure in terms of predictive power and training time.
SYMar 8, 2018
A Multiple-Input Multiple-Output CepstrumOliver Lauwers, Oscar Mauricio Agudelo, Bart De Moor
This paper extends the concept of scalar cepstrum coefficients from single-input single-output linear time invariant dynamical systems to multiple-input multiple-output models, making use of the Smith-McMillan form of the transfer function. These coefficients are interpreted in terms of poles and transmission zeros of the underlying dynamical system. We present a method to compute the MIMO cepstrum based on input/output signal data for systems with square transfer function matrices (i.e. systems with as many inputs as outputs). This allows us to do a model-free analysis. Two examples to illustrate these results are included: a simple MIMO system with 3 inputs and 3 outputs, of which the poles and zeros are known exactly, that allows us to directly verify the equivalences derived in the paper, and a case study on realistic data. This case study analyses data coming from a (model of) a non-isothermal continuous stirred tank reactor, which experiences linear fouling. We analyse normal and faulty operating behaviour, both with and without a controller present. We show that the cepstrum detects faulty behaviour, even when hidden by controller compensation. The code for the numerical analysis is available online.
56.0NAMay 20
Rectangular Multispectral Perturbation TheoryChristof Vermeersch, Sarthak De, Bart De Moor
We provide a first systematic treatment of so-called rectangular multispectral perturbation theory. With their paper from 2003, Hochstenbach and Plestenjak ["Backward Error, Condition Numbers, and Pseudospectra for the Multiparameter Eigenvalue Problem" in Linear Algebra and its Applications] extended perturbation theory from one-parameter eigenvalue problems to multiple spectral parameters. After two decades, we take it one step further and consider a different manifestation of the multiparameter eigenvalue problem that consists of one matrix equation with rectangular coefficient matrices. We perform a norm-wise backward error analysis, define condition numbers for both eigenvalues and eigenvectors, and introduce the pseudospectrum while also considering the computational implications of working with multiple spectral parameters. The rectangular shape hampers a direct application of the existing definitions and properties. For example, the left null space at a given eigenvalue is non-trivial and the dimensions of the left and right eigenvectors are different. Through numerical examples, we illustrate and link the different concepts from the perturbation theory. A system identification application seem to suggest that, in optimization-driven problems for which multiparameter reformulations exist, the globally optimal solutions tend to coincide with the best-conditioned eigenvalues.
60.3MSMay 20
Solving Multivariate Polynomial Systems and Rectangular Multiparameter Eigenvalue Problems with MacaulayLabChristof Vermeersch, Bart De Moor
We present the Matlab toolbox MacaulayLab, which implements numerical linear algebra algorithms for solving multivariate polynomial systems and rectangular multiparameter eigenvalue problems. Its structure and functionality are the result of several years of research and algorithmic development. We demonstrate how the software works and compare its performance with other software packages, such as PNLA, PHCpack, and MultiParEig. Some core features of MacaulayLab are the fact that it solves two key problems via one common approach, works independently of the chosen polynomial basis and monomial order, and is capable of dealing with positive-dimensional solution sets at infinity. The toolbox (including its future updates) and a large collection of test problems are freely available online.
LGMay 26, 2023
Duality in Multi-View Restricted Kernel MachinesSonny Achten, Arun Pandey, Hannes De Meulemeester et al.
We propose a unifying setting that combines existing restricted kernel machine methods into a single primal-dual multi-view framework for kernel principal component analysis in both supervised and unsupervised settings. We derive the primal and dual representations of the framework and relate different training and inference algorithms from a theoretical perspective. We show how to achieve full equivalence in primal and dual formulations by rescaling primal variables. Finally, we experimentally validate the equivalence and provide insight into the relationships between different methods on a number of time series data sets by recursively forecasting unseen test data and visualizing the learned features.
LGApr 6, 2021
Leverage Score Sampling for Complete Mode Coverage in Generative Adversarial NetworksJoachim Schreurs, Hannes De Meulemeester, Michaël Fanuel et al.
Commonly, machine learning models minimize an empirical expectation. As a result, the trained models typically perform well for the majority of the data but the performance may deteriorate in less dense regions of the dataset. This issue also arises in generative modeling. A generative model may overlook underrepresented modes that are less frequent in the empirical data distribution. This problem is known as complete mode coverage. We propose a sampling procedure based on ridge leverage scores which significantly improves mode coverage when compared to standard methods and can easily be combined with any GAN. Ridge leverage scores are computed by using an explicit feature map, associated with the next-to-last layer of a GAN discriminator or of a pre-trained network, or by using an implicit feature map corresponding to a Gaussian kernel. Multiple evaluations against recent approaches of complete mode coverage show a clear improvement when using the proposed sampling strategy.
LGJun 16, 2020
The Bures Metric for Generative Adversarial NetworksHannes De Meulemeester, Joachim Schreurs, Michaël Fanuel et al.
Generative Adversarial Networks (GANs) are performant generative methods yielding high-quality samples. However, under certain circumstances, the training of GANs can lead to mode collapse or mode dropping, i.e. the generative models not being able to sample from the entire probability distribution. To address this problem, we use the last layer of the discriminator as a feature map to study the distribution of the real and the fake data. During training, we propose to match the real batch diversity to the fake batch diversity by using the Bures distance between covariance matrices in feature space. The computation of the Bures distance can be conveniently done in either feature space or kernel space in terms of the covariance and kernel matrix respectively. We observe that diversity matching reduces mode collapse substantially and has a positive effect on the sample quality. On the practical side, a very simple training procedure, that does not require additional hyperparameter tuning, is proposed and assessed on several datasets.
SYMar 8, 2018
Applicability and interpretation of the deterministic weighted cepstral distanceOliver Lauwers, Bart De Moor
Quantifying similarity between data objects is an important part of modern data science. Deciding what similarity measure to use is very application dependent. In this paper, we combine insights from systems theory and machine learning, and investigate the weighted cepstral distance, which was previously defined for signals coming from ARMA models. We provide an extension of this distance to invertible deterministic linear time invariant single input single output models, and assess its applicability. We show that it can always be interpreted in terms of the poles and zeros of the underlying model, and that, in the case of stable, minimum-phase, or unstable, maximum-phase models, a geometrical interpretation in terms of subspace angles can be given. We then devise a method to assess stability and phase-type of the generating models, using only input/output signal information. In this way, we prove a connection between the extended weighted cepstral distance and a weighted cepstral model norm. In this way, we provide a purely data-driven way to assess different underlying dynamics of input/output signal pairs, without the need for any system identification step. This can be useful in machine learning tasks such as time series clustering. An iPython tutorial is published complementary to this paper, containing implementations of the various methods and algorithms presented here, as well as some numerical illustrations of the equivalences proven here.
SYMar 6, 2017
A time series distance measure for efficient clustering of input output signals by their underlying dynamicsOliver Lauwers, Bart De Moor
Starting from a dataset with input/output time series generated by multiple deterministic linear dynamical systems, this paper tackles the problem of automatically clustering these time series. We propose an extension to the so-called Martin cepstral distance, that allows to efficiently cluster these time series, and apply it to simulated electrical circuits data. Traditionally, two ways of handling the problem are used. The first class of methods employs a distance measure on time series (e.g. Euclidean, Dynamic Time Warping) and a clustering technique (e.g. k-means, k-medoids, hierarchical clustering) to find natural groups in the dataset. It is, however, often not clear whether these distance measures effectively take into account the specific temporal correlations in these time series. The second class of methods uses the input/output data to identify a dynamic system using an identification scheme, and then applies a model norm-based distance (e.g. H2, H-infinity) to find out which systems are similar. This, however, can be very time consuming for large amounts of long time series data. We show that the new distance measure presented in this paper performs as good as when every input/output pair is modelled explicitly, but remains computationally much less complex. The complexity of calculating this distance between two time series of length N is O(N logN).
MLApr 28, 2015
Building Classifiers to Predict the Start of Glucose-Lowering Pharmacotherapy Using Belgian Health Expenditure DataMarc Claesen, Frank De Smet, Pieter Gillard et al.
Early diagnosis is important for type 2 diabetes (T2D) to improve patient prognosis, prevent complications and reduce long-term treatment costs. We present a novel risk profiling approach based exclusively on health expenditure data that is available to Belgian mutual health insurers. We used expenditure data related to drug purchases and medical provisions to construct models that predict whether a patient will start glucose-lowering pharmacotherapy in the coming years, based on that patient's recent medical expenditure history. The design and implementation of the modeling strategy are discussed in detail and several learning methods are benchmarked for our application. Our best performing model obtains between 74.9% and 76.8% area under the ROC curve, which is comparable to state-of-the-art risk prediction approaches for T2D based on questionnaires. In contrast to other methods, our approach can be implemented on a population-wide scale at virtually no extra operational cost. Possibly, our approach can be further improved by additional information about some risk factors of T2D that is unavailable in health expenditure data.
MLApr 26, 2015
Assessing binary classifiers using only positive and unlabeled dataMarc Claesen, Jesse Davis, Frank De Smet et al.
Assessing the performance of a learned model is a crucial part of machine learning. However, in some domains only positive and unlabeled examples are available, which prohibits the use of most standard evaluation metrics. We propose an approach to estimate any metric based on contingency tables, including ROC and PR curves, using only positive and unlabeled data. Estimating these performance metrics is essentially reduced to estimating the fraction of (latent) positives in the unlabeled set, assuming known positives are a random sample of all positives. We provide theoretical bounds on the quality of our estimates, illustrate the importance of estimating the fraction of positives in the unlabeled set and demonstrate empirically that we are able to reliably estimate ROC and PR curves on real data.
LGFeb 7, 2015
Hyperparameter Search in Machine LearningMarc Claesen, Bart De Moor
We introduce the hyperparameter search problem in the field of machine learning and discuss its main challenges from an optimization perspective. Machine learning methods attempt to build models that capture some element of interest based on given data. Most common learning algorithms feature a set of hyperparameters that must be determined before training commences. The choice of hyperparameters can significantly affect the resulting model's performance, but determining good values can be complex; hence a disciplined, theoretically sound search strategy is essential.
LGDec 2, 2014
Easy Hyperparameter Search Using OptunityMarc Claesen, Jaak Simm, Dusan Popovic et al.
Optunity is a free software package dedicated to hyperparameter optimization. It contains various types of solvers, ranging from undirected methods to direct search, particle swarm and evolutionary optimization. The design focuses on ease of use, flexibility, code clarity and interoperability with existing software in all machine learning environments. Optunity is written in Python and contains interfaces to environments such as R and MATLAB. Optunity uses a BSD license and is freely available online at http://www.optunity.net.
MLMar 4, 2014
EnsembleSVM: A Library for Ensemble Learning Using Support Vector MachinesMarc Claesen, Frank De Smet, Johan Suykens et al.
EnsembleSVM is a free software package containing efficient routines to perform ensemble learning with support vector machine (SVM) base models. It currently offers ensemble methods based on binary SVM models. Our implementation avoids duplicate storage and evaluation of support vectors which are shared between constituent models. Experimental results show that using ensemble approaches can drastically reduce training complexity while maintaining high predictive accuracy. The EnsembleSVM software package is freely available online at http://esat.kuleuven.be/stadius/ensemblesvm.
MLMar 4, 2014
Fast Prediction with SVM Models Containing RBF KernelsMarc Claesen, Frank De Smet, Johan A. K. Suykens et al.
We present an approximation scheme for support vector machine models that use an RBF kernel. A second-order Maclaurin series approximation is used for exponentials of inner products between support vectors and test instances. The approximation is applicable to all kernel methods featuring sums of kernel evaluations and makes no assumptions regarding data normalization. The prediction speed of approximated models no longer relates to the amount of support vectors but is quadratic in terms of the number of input dimensions. If the number of input dimensions is small compared to the amount of support vectors, the approximated model is significantly faster in prediction and has a smaller memory footprint. An optimized C++ implementation was made to assess the gain in prediction speed in a set of practical tests. We additionally provide a method to verify the approximation accuracy, prior to training models or during run-time, to ensure the loss in accuracy remains acceptable and within known bounds.
MLFeb 13, 2014
A Robust Ensemble Approach to Learn From Positive and Unlabeled Data Using SVM Base ModelsMarc Claesen, Frank De Smet, Johan A. K. Suykens et al.
We present a novel approach to learn binary classifiers when only positive and unlabeled instances are available (PU learning). This problem is routinely cast as a supervised task with label noise in the negative set. We use an ensemble of SVM models trained on bootstrap resamples of the training data for increased robustness against label noise. The approach can be considered in a bagging framework which provides an intuitive explanation for its mechanics in a semi-supervised setting. We compared our method to state-of-the-art approaches in simulations using multiple public benchmark data sets. The included benchmark comprises three settings with increasing label noise: (i) fully supervised, (ii) PU learning and (iii) PU learning with false positives. Our approach shows a marginal improvement over existing methods in the second setting and a significant improvement in the third.