LGOct 10, 2025

A PCA-based Data Prediction Method

arXiv:2510.09246v1h-index: 36Balt J Mod Comput
Originality Synthesis-oriented
AI Analysis

This addresses the common issue of missing data in data science, but appears incremental as it builds on existing PCA-based approaches.

The paper tackles the problem of missing data imputation by introducing a method that combines traditional mathematics and machine learning, using distances between shifted linear subspaces based on principal components, with solutions provided for the Euclidean metric.

The problem of choosing appropriate values for missing data is often encountered in the data science. We describe a novel method containing both traditional mathematics and machine learning elements for prediction (imputation) of missing data. This method is based on the notion of distance between shifted linear subspaces representing the existing data and candidate sets. The existing data set is represented by the subspace spanned by its first principal components. Solutions for the case of the Euclidean metric are given.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes