Kan Extensions in Data Science and Machine Learning
This work introduces a novel theoretical framework for data science problems like extrapolation and clustering, but it appears incremental as it applies existing mathematical concepts to new domains without broad SOTA impact.
The paper tackles the problem of generalizing functions from small to larger sets in data science by applying Kan extensions from category theory, resulting in a classification algorithm and a procedure for learning clustering algorithms that were tested on real data.
A common problem in data science is "use this function defined over this small set to generate predictions over that larger set." Extrapolation, interpolation, statistical inference and forecasting all reduce to this problem. The Kan extension is a powerful tool in category theory that generalizes this notion. In this work we explore several applications of Kan extensions to data science. We begin by deriving a simple classification algorithm as a Kan extension and experimenting with this algorithm on real data. Next, we use the Kan extension to derive a procedure for learning clustering algorithms from labels and explore the performance of this procedure on real data. We then investigate how Kan extensions can be used to learn a general mapping from datasets of labeled examples to functions and to approximate a complex function with a simpler one.