LGFeb 3, 2023
On the Analysis of Correlation Between Nominal Data and Numerical DataZenon Gniazdowski
The article investigates the possibility of measuring the strength of a linear correlation relationship between nominal data and numerical data. Correlation coefficients for variables coded with real numbers as well as for variables coded with complex numbers were studied. For variables coded with real numbers, unambiguous measures of real linear correlation were obtained. In the case of complex coding, it has been observed that the obtained complex correlation coefficients change with the permutation of the phases in the complex numbers used to code classes of elements with equal cardinalities. It was found that a necessary condition for linear correlation is the possibility of linear ordering of a set with data. Since linear order is not possible in the set of complex numbers, complex correlation coefficients cannot be used as a measure of linear correlation. In the event of such a situation, a substitute action was suggested that would prevent equal cardinality of classes of identical elements contained in the set with nominal data. This action would consist in the correction of data, analogous to the correction during preprocessing or cleaning of data containing missing or outlier values.
LGOct 9, 2023
On the Correlation between Random Variables and their Principal ComponentsZenon Gniazdowski
The article attempts to find an algebraic formula describing the correlation coefficients between random variables and the principal components representing them. As a result of the analysis, starting from selected statistics relating to individual random variables, the equivalents of these statistics relating to a set of random variables were presented in the language of linear algebra, using the concepts of vector and matrix. This made it possible, in subsequent steps, to derive the expected formula. The formula found is identical to the formula used in Factor Analysis to calculate factor loadings. The discussion showed that it is possible to apply this formula to optimize the number of principal components in Principal Component Analysis, as well as to optimize the number of factors in Factor Analysis.
LGDec 12, 2024
New Approach to Clustering Random AttributesZenon Gniazdowski
This paper proposes a new method for similarity analysis and, consequently, a new algorithm for clustering different types of random attributes, both numerical and nominal. However, in order for nominal attributes to be clustered, their values must be properly encoded. In the encoding process, nominal attributes obtain a new representation in numerical form. Only the numeric attributes can be subjected to factor analysis, which allows them to be clustered in terms of their similarity to factors. The proposed method was tested for several sample datasets. It was found that the proposed method is universal. On the one hand, the method allows clustering of numerical attributes. On the other hand, it provides the ability to cluster nominal attributes. It also allows simultaneous clustering of numerical attributes and numerically encoded nominal attributes.
LGOct 21, 2021
Principal Component Analysis versus Factor AnalysisZenon Gniazdowski
The article discusses selected problems related to both principal component analysis (PCA) and factor analysis (FA). In particular, both types of analysis were compared. A vector interpretation for both PCA and FA has also been proposed. The problem of determining the number of principal components in PCA and factors in FA was discussed in detail. A new criterion for determining the number of factors and principal components is discussed, which will allow to present most of the variance of each of the analyzed primary variables. An efficient algorithm for determining the number of factors in FA, which complies with this criterion, was also proposed. This algorithm was adapted to find the number of principal components in PCA. It was also proposed to modify the PCA algorithm using a new method of determining the number of principal components. The obtained results were discussed.
SEDec 17, 2019
Detection of a Source Code Plagiarism in a Student Programming CompetitionZenon Gniazdowski, Maciej Boniecki
The article presents a system for testing the independence of solutions to algorithmic problems sent by students as part of the student programming competition. First, the context was discussed, as well as the need to organize programming competitions resulting from this context. Then, an algorithm was proposed to study the mutual similarity of source codes of programs sent as part of a programming competition. Since, after implementation, the algorithm was used in practice, examples of its application for detecting the plagiarism of source codes of solutions in two programming competitions conducted as part of classes on Algorithms and Numerical Methods were also presented. Finally, the effectiveness of the solutions used in the work was discussed.
LGSep 7, 2019
On the clustering of correlated random variablesZenon Gniazdowski, Dawid Kaliszewski
In this work, the possibility of clustering correlated random variables was examined, both because of their mutual similarity and because of their similarity to the principal components. The k-means algorithm and spectral algorithms were used for clustering. For spectral methods, the similarity matrix was both the matrix of relation established on the level of correlation and the matrix of coefficients of determination. For four different sets of data, different ways of measuring the disimilarity of variables were analyzed, and the impact of the diversity of initial points on the efficiency of the k-means algorithm was analyzed.
MLJan 8, 2016
Numerical Coding of Nominal DataZenon Gniazdowski, Michal Grabowski
In this paper, a novel approach for coding nominal data is proposed. For the given nominal data, a rank in a form of complex number is assigned. The proposed method does not lose any information about the attribute and brings other properties previously unknown. The approach based on these knew properties can been used for classification. The analyzed example shows that classification with the use of coded nominal data or both numerical as well as coded nominal data is more effective than the classification, which uses only numerical data.