LGFeb 12, 2025
The Art of Misclassification: Too Many Classes, Not Enough PointsMario Franco, Gerardo Febres, Nelson Fernández et al.
Classification is a ubiquitous and fundamental problem in artificial intelligence and machine learning, with extensive efforts dedicated to developing more powerful classifiers and larger datasets. However, the classification task is ultimately constrained by the intrinsic properties of datasets, independently of computational power or model complexity. In this work, we introduce a formal entropy-based measure of classificability, which quantifies the inherent difficulty of a classification problem by assessing the uncertainty in class assignments given feature representations. This measure captures the degree of class overlap and aligns with human intuition, serving as an upper bound on classification performance for classification problems. Our results establish a theoretical limit beyond which no classifier can improve the classification accuracy, regardless of the architecture or amount of data, in a given problem. Our approach provides a principled framework for understanding when classification is inherently fallible and fundamentally ambiguous.
SDOct 7, 2015
Music Viewed by its Entropy Content: A Novel Window for Comparative AnalysisGerardo Febres, Klaus Jaffe
Polyphonic music files were analyzed using the set of symbols that produced the Minimal Entropy Description which we call the Fundamental Scale. This allowed us to create a novel space to represent music pieces by developing: a) a method to adjust a description from its original scale of observation to a general scale, b) the concept of higher order entropy as the entropy associated to the deviations of a frequency ranked symbol profile from a perfect Zipf profile. We called this diversity index the "2nd Order Entropy". Applying these methods to a variety of musical pieces showed how the space of "symbolic specific diversity-entropy" and that of "2nd order entropy" captures characteristics that are unique to each music type, style, composer and genre. Some clustering of these properties around each musical category is shown. This method allows to visualize a historic trajectory of academic music across this space, from medieval to contemporary academic music. We show that description of musical structures using entropy and symbolic diversity allows to characterize traditional and popular expressions of music. These classification techniques promise to be useful in other disciplines for pattern recognition and machine learning, for example.
ITOct 5, 2015
Calculating entropy at different scales among diverse communication systemsGerardo Febres, Klaus Jaffe
We evaluated the impact of changing the observation scale over the entropy measures for text descriptions. MIDI coded Music, computer code and two human natural languages were studied at the scale of characters, words, and at the Fundamental Scale resulting from adjusting the symbols length used to interpret each text-description until it produced minimum entropy. The results show that the Fundamental Scale method is comparable with the use of words when measuring entropy levels in written texts. However, this method can also be used in communication systems lacking words such as music. Measuring symbolic entropy at the fundamental scale allows to calculate quantitatively, relative levels of complexity for different communication systems. The results open novel vision on differences among the structure of the communication systems studied.
CLJan 28, 2014
Quantifying literature quality using complexity criteriaGerardo Febres, Klaus Jaffe
We measured entropy and symbolic diversity for English and Spanish texts including literature Nobel laureates and other famous authors. Entropy, symbol diversity and symbol frequency profiles were compared for these four groups. We also built a scale sensitive to the quality of writing and evaluated its relationship with the Flesch's readability index for English and the Szigriszt's perspicuity index for Spanish. Results suggest a correlation between entropy and word diversity with quality of writing. Text genre also influences the resulting entropy and diversity of the text. Results suggest the plausibility of automated quality assessment of texts.
CLNov 20, 2013
Complexity measurement of natural and artificial languagesGerardo Febres, Klaus Jaffe, Carlos Gershenson
We compared entropy for texts written in natural languages (English, Spanish) and artificial languages (computer software) based on a simple expression for the entropy as a function of message length and specific word diversity. Code text written in artificial languages showed higher entropy than text of similar length expressed in natural languages. Spanish texts exhibit more symbolic diversity than English ones. Results showed that algorithms based on complexity measures differentiate artificial from natural languages, and that text analysis based on complexity measures allows the unveiling of important aspects of their nature. We propose specific expressions to examine entropy related aspects of tests and estimate the values of entropy, emergence, self-organization and complexity based on specific diversity and message length.