LGAug 28, 2022
Domain Adaptation Principal Component Analysis: base linear method for learning with out-of-distribution dataEvgeny M Mirkes, Jonathan Bac, Aziz Fouché et al.
Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets red into a common space in which the source dataset is informative for training while the divergence between source and target is minimized. The most popular domain adaptation solutions are based on training neural networks that combine classification and adversarial learning modules, frequently making them both data-hungry and difficult to train. We present a method called Domain Adaptation Principal Component Analysis (DAPCA) that identifies a linear reduced data representation useful for solving the domain adaptation task. DAPCA algorithm introduces positive and negative weights between pairs of data points, and generalizes the supervised extension of principal component analysis. DAPCA is an iterative algorithm that solves a simple quadratic optimization problem at each iteration. The convergence of the algorithm is guaranteed, and the number of iterations is small in practice. We validate the suggested algorithm on previously proposed benchmarks for solving the domain adaptation task. We also show the benefit of using DAPCA in analyzing the single-cell omics datasets in biomedical applications. Overall, DAPCA can serve as a practical preprocessing step in many machine learning applications leading to reduced dataset representations, taking into account possible divergence between source and target domains.
NCOct 20, 2025
Using machine learning methods to predict cognitive age from psychophysiological testsDaria D. Tyurina, Sergey V. Stasenko, Konstantin V. Lushnikov et al.
This study introduces a novel method for predicting cognitive age using psychophysiological tests. To determine cognitive age, subjects were asked to complete a series of psychological tests measuring various cognitive functions, including reaction time and cognitive conflict, short-term memory, verbal functions, and color and spatial perception. Based on the tests completed, the average completion time, proportion of correct answers, average absolute delta of the color campimetry test, number of guessed words in the Münsterberg matrix, and other parameters were calculated for each subject. The obtained characteristics of the subjects were preprocessed and used to train a machine learning algorithm implementing a regression task for predicting a person's cognitive age. These findings contribute to the field of remote screening using mobile devices for human health for diagnosing and monitoring cognitive aging.
CVOct 2, 2025
Error correction in multiclass image classification of facial emotion on unbalanced samplesAndrey A. Lebedev, Victor B. Kazantsev, Sergey V. Stasenko
This paper considers the problem of error correction in multi-class classification of face images on unbalanced samples. The study is based on the analysis of a data frame containing images labeled by seven different emotional states of people of different ages. Particular attention is paid to the problem of class imbalance, in which some emotions significantly prevail over others. To solve the classification problem, a neural network model based on LSTM with an attention mechanism focusing on key areas of the face that are informative for emotion recognition is used. As part of the experiments, the model is trained on all possible configurations of subsets of six classes with subsequent error correction for the seventh class, excluded at the training stage. The results show that correction is possible for all classes, although the degree of success varies: some classes are better restored, others are worse. In addition, on the test sample, when correcting some classes, an increase in key quality metrics for small classes was recorded, which indicates the promise of the proposed approach in solving applied problems related to the search for rare events, for example, in anti-fraud systems. Thus, the proposed method can be effectively applied in facial expression analysis systems and in tasks requiring stable classification under skewed class distribution.
LGJun 28, 2021
High-dimensional separability for one- and few-shot learningAlexander N. Gorban, Bogdan Grechuk, Evgeny M. Mirkes et al.
This work is driven by a practical question: corrections of Artificial Intelligence (AI) errors. These corrections should be quick and non-iterative. To solve this problem without modification of a legacy AI system, we propose special `external' devices, correctors. Elementary correctors consist of two parts, a classifier that separates the situations with high risk of error from the situations in which the legacy AI system works well and a new decision for situations with potential errors. Input signals for the correctors can be the inputs of the legacy AI system, its internal signals, and outputs. If the intrinsic dimensionality of data is high enough then the classifiers for correction of small number of errors can be very simple. According to the blessing of dimensionality effects, even simple and robust Fisher's discriminants can be used for one-shot learning of AI correctors. Stochastic separation theorems provide the mathematical basis for this one-short learning. However, as the number of correctors needed grows, the cluster structure of data becomes important and a new family of stochastic separation theorems is required. We refuse the classical hypothesis of the regularity of the data distribution and assume that the data can have a fine-grained structure with many clusters and peaks in the probability density. New stochastic separation theorems for data with fine-grained structure are formulated and proved. The multi-correctors for granular data are proposed. The advantages of the multi-corrector technology were demonstrated by examples of correcting errors and learning new classes of objects by a deep convolutional neural network on the CIFAR-10 dataset. The key problems of the non-classical high-dimensional data analysis are reviewed together with the basic preprocessing steps including supervised, semi-supervised and domain adaptation Principal Component Analysis.