LGMLJun 29, 2020

Kendall transformation: a robust representation of continuous data for information theory

arXiv:2006.15991v28 citations
Originality Incremental advance
AI Analysis

This provides a robust method for researchers in statistics and machine learning to handle continuous data in information theory, especially with small sample sizes, but it is incremental as it builds on existing non-parametric approaches.

The paper tackles the problem of applying information-theoretic methods to continuous data by introducing the Kendall transformation, which converts continuous features into categorical pairwise order relations, enabling direct application without differential entropy and improving robustness, though at the cost of dropping complex interactions.

Kendall transformation is a conversion of an ordered feature into a vector of pairwise order relations between individual values. This way, it preserves ranking of observations and represents it in a categorical form. Such transformation allows for generalisation of methods requiring strictly categorical input, especially in the limit of small number of observations, when discretisation becomes problematic. In particular, many approaches of information theory can be directly applied to Kendall-transformed continuous data without relying on differential entropy or any additional parameters. Moreover, by filtering information to this contained in ranking, Kendall transformation leads to a better robustness at a reasonable cost of dropping sophisticated interactions which are anyhow unlikely to be correctly estimated. In bivariate analysis, Kendall transformation can be related to popular non-parametric methods, showing the soundness of the approach. The paper also demonstrates its efficiency in multivariate problems, as well as provides an example analysis of a real-world data.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes