On Irrelevance of Attributes in Flexible Prediction
This work addresses the problem of improving classification accuracy in concept formation for researchers in machine learning, but it appears incremental as it builds on existing flexible prediction methods.
The paper investigates how attribute relevance affects conceptual hierarchies in flexible prediction, finding that both weakly and strongly correlated attributes degrade classification, and that derived attribute construction and scaling strongly influence the hierarchy.
This paper analyses properties of conceptual hierarchy obtained via incremental concept formation method called "flexible prediction" in order to determine what kind of "relevance" of participating attributes may be requested for meaningful conceptual hierarchy. The impact of selection of simple and combined attributes, of scaling and of distribution of individual attributes and of correlation strengths among them is investigated. Paradoxically, both: attributes weakly and strongly related with other attributes have deteriorating impact onto the overall classification. Proper construction of derived attributes as well as selection of scaling of individual attributes strongly influences the obtained concept hierarchy. Attribute density of distribution seems to influence the classification weakly It seems also, that concept hierarchies (taxonomies) reflect a compromise between the data and our interests in some objective truth about the data. To obtain classifications more suitable for one's purposes, breaking the symmetry among attributes (by dividing them into dependent and independent and applying differing evaluation formulas for their contribution) is suggested. Both continuous and discrete variables are considered. Some methodologies for the former are considered.