Pauliina Ilmonen

LG
h-index13
3papers
2citations
Novelty32%
AI Score29

3 Papers

NAOct 4, 2017
Computation of extremal eigenvalues of high-dimensional lattice-theoretic tensors via tensor-train decompositions

Harri Hakula, Pauliina Ilmonen, Vesa Kaarnioja

This paper lies in the intersection of several fields: number theory, lattice theory, multilinear algebra, and scientific computing. We adapt existing solution algorithms for tensor eigenvalue problems to the tensor-train framework. As an application, we consider eigenvalue problems associated with a class of lattice-theoretic meet and join tensors, which may be regarded as multidimensional extensions of the classically studied meet and join matrices such as GCD and LCM matrices, respectively. In order to effectively apply the solution algorithms, we show that meet tensors have an explicit low-rank tensor-train decomposition with sparse tensor-train cores with respect to the dimension. Moreover, this representation is independent of tensor order, which eliminates the so-called curse of dimensionality from the numerical analysis of these objects and makes the solution of tensor eigenvalue problems tractable with increasing dimensionality and order. For LCM tensors it is shown that a tensor-train decomposition with an a priori known TT rank exists under certain assumptions. We present a series of easily reproducible numerical examples covering tensor eigenvalue and generalized eigenvalue problems that serve as future benchmarks. The numerical results are used to assess the sharpness of existing theoretical estimates.

NIOct 21, 2024Code
Data Matters: The Case of Predicting Mobile Cellular Traffic

Natalia Vesselinova, Matti Harjula, Pauliina Ilmonen

Accurate predictions of base stations' traffic load are essential to mobile cellular operators and their users as they support the efficient use of network resources and allow delivery of services that sustain smart cities and roads. Traditionally, cellular network time-series have been considered for this prediction task. More recently, exogenous factors such as points of interest and other environmental knowledge have been explored too. In contrast to incorporating external factors, we propose to learn the processes underlying cellular load generation by employing population dynamics data. In this study, we focus on smart roads and use road traffic measures to improve prediction accuracy. Comprehensive experiments demonstrate that by employing road flow and speed, in addition to cellular network metrics, base station load prediction errors can be substantially reduced, by as much as $56.5\%.$ The code, visualizations and extensive results are available on https://github.com/nvassileva/DataMatters.

LGSep 2, 2025
Extrapolated Markov Chain Oversampling Method for Imbalanced Text Classification

Aleksi Avela, Pauliina Ilmonen

Text classification is the task of automatically assigning text documents correct labels from a predefined set of categories. In real-life (text) classification tasks, observations and misclassification costs are often unevenly distributed between the classes - known as the problem of imbalanced data. Synthetic oversampling is a popular approach to imbalanced classification. The idea is to generate synthetic observations in the minority class to balance the classes in the training set. Many general-purpose oversampling methods can be applied to text data; however, imbalanced text data poses a number of distinctive difficulties that stem from the unique nature of text compared to other domains. One such factor is that when the sample size of text increases, the sample vocabulary (i.e., feature space) is likely to grow as well. We introduce a novel Markov chain based text oversampling method. The transition probabilities are estimated from the minority class but also partly from the majority class, thus allowing the minority feature space to expand in oversampling. We evaluate our approach against prominent oversampling methods and show that our approach is able to produce highly competitive results against the other methods in several real data examples, especially when the imbalance is severe.