Joris Tavernier

AI
3papers
14citations
Novelty45%
AI Score21

3 Papers

NAOct 7, 2020
Two-level preconditioning for Ridge Regression

Joris Tavernier, Jaak Simm, Karl Meerbergen et al.

Solving linear systems is often the computational bottleneck in real-life problems. Iterative solvers are the only option due to the complexity of direct algorithms or because the system matrix is not explicitly known. Here, we develop a two-level preconditioner for regularized least squares linear systems involving a feature or data matrix. Variants of this linear system may appear in machine learning applications, such as ridge regression, logistic regression, support vector machines and Bayesian regression. We use clustering algorithms to create a coarser level that preserves the principal components of the covariance or Gram matrix. This coarser level approximates the dominant eigenvectors and is used to build a subspace preconditioner accelerating the Conjugate Gradient method. We observed speed-ups for artificial and real-life data.

COSep 25, 2020
Multilevel Gibbs Sampling for Bayesian Regression

Joris Tavernier, Jaak Simm, Adam Arany et al.

Bayesian regression remains a simple but effective tool based on Bayesian inference techniques. For large-scale applications, with complicated posterior distributions, Markov Chain Monte Carlo methods are applied. To improve the well-known computational burden of Markov Chain Monte Carlo approach for Bayesian regression, we developed a multilevel Gibbs sampler for Bayesian regression of linear mixed models. The level hierarchy of data matrices is created by clustering the features and/or samples of data matrices. Additionally, the use of correlated samples is investigated for variance reduction to improve the convergence of the Markov Chain. Testing on a diverse set of data sets, speed-up is achieved for almost all of them without significant loss in predictive performance.

AISep 14, 2017
Fast semi-supervised discriminant analysis for binary classification of large data-sets

Joris Tavernier, Jaak Simm, Karl Meerbergen et al.

High-dimensional data requires scalable algorithms. We propose and analyze three scalable and related algorithms for semi-supervised discriminant analysis (SDA). These methods are based on Krylov subspace methods which exploit the data sparsity and the shift-invariance of Krylov subspaces. In addition, the problem definition was improved by adding centralization to the semi-supervised setting. The proposed methods are evaluated on a industry-scale data set from a pharmaceutical company to predict compound activity on target proteins. The results show that SDA achieves good predictive performance and our methods only require a few seconds, significantly improving computation time on previous state of the art.