The Spaces of Data, Information, and Knowledge
This work addresses the challenge of automated knowledge discovery for domains like geometry, but it appears incremental as it builds on existing principles like induction and deduction.
The paper tackles the problem of automatically discovering knowledge from data by constructing topological spaces for data, information, and knowledge, demonstrating efficient approaches in geometry.
We study the data space $D$ of any given data set $X$ and explain how functions and relations are defined over $D$. From $D$ and for a specific domain $Δ$ we construct the information space $I$ of $X$ by interpreting variables, functions, and explicit relations over $D$ in $Δ$ and by including other relations that $D$ implies under the interpretation in $Δ$. Then from $I$ we build up the knowledge space $K$ of $X$ as the product of two spaces $K_T$ and $K_P$, where $K_T$ is obtained from $I$ by using the induction principle to generalize propositional relations to quantified relations, the deduction principle to generate new relations, and standard mechanisms to validate relations and $K_P$ is the space of specifications of methods with operational instructions which are valid in $K_T$. Through our construction of the three topological spaces the following key observation is made clear: the retrieval of information from the given data set for $Δ$ consists essentially in mining domain objects and relations, and the discovery of knowledge from the retrieved information consists essentially in applying the induction and deduction principles to generate propositions, synthesizing and modeling the information to generate specifications of methods with operational instructions, and validating the propositions and specifications. Based on this observation, efficient approaches may be designed to discover profound knowledge automatically from simple data, as demonstrated by the result of our study in the case of geometry.