DATA-ANHCLGNANov 15, 2025

Human-aligned Quantification of Numerical Data

arXiv:2511.15723v1
Originality Incremental advance
AI Analysis

This work addresses the challenge of making data quantification more human-aligned, which is incremental as it compares existing metrics for a specific task.

The study tackled the problem of quantifying numerical data by evaluating metrics like the Silhouette coefficient and information compression for classifying values into meaningful categories, finding that a Silhouette coefficient above 0.65 and a Dip Test below 0.5 indicate classifiability, with the Silhouette coefficient aligning better with human intuition than normalized centroid distance.

Quantifying numerical data involves addressing two key challenges: first, determining whether the data can be naturally quantified, and second, identifying the numerical intervals or ranges of values that correspond to specific value classes, referred to as "quantums," which represent statistically meaningful states. If such quantification is feasible, continuous streams of numerical data can be transformed into sequences of "symbols" that reflect the states of the system described by the measured parameter. People often perform this task intuitively, relying on common sense or practical experience, while information theory and computer science offer computable metrics for this purpose. In this study, we assess the applicability of metrics based on information compression and the Silhouette coefficient for quantifying numerical data. We also investigate the extent to which these metrics correlate with one another and with what is commonly referred to as "human intuition." Our findings suggest that the ability to classify numeric data values into distinct categories is associated with a Silhouette coefficient above 0.65 and a Dip Test below 0.5; otherwise, the data can be treated as following a unimodal normal distribution. Furthermore, when quantification is possible, the Silhouette coefficient appears to align more closely with human intuition than the "normalized centroid distance" method derived from information compression perspective.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes