Lifelong Learning in Costly Feature Spaces
This work addresses the challenge of building resource-efficient lifelong learning agents for real-world applications with costly features, though it is incremental as it builds on existing lifelong learning paradigms with new theoretical insights.
The paper tackles the problem of lifelong learning in domains where feature evaluations are costly, proposing a framework that reuses information from previous tasks to learn new tasks efficiently, with algorithms for decision trees/lists and monomials/polynomials showing strong feature-efficiency guarantees, such as needing only slightly more feature evaluations per training example than for prediction.
An important long-term goal in machine learning systems is to build learning agents that, like humans, can learn many tasks over their lifetime, and moreover use information from these tasks to improve their ability to do so efficiently. In this work, our goal is to provide new theoretical insights into the potential of this paradigm. In particular, we propose a lifelong learning framework that adheres to a novel notion of resource efficiency that is critical in many real-world domains where feature evaluations are costly. That is, our learner aims to reuse information from previously learned related tasks to learn future tasks in a feature-efficient manner. Furthermore, we consider novel combinatorial ways in which learning tasks can relate. Specifically, we design lifelong learning algorithms for two structurally different and widely used families of target functions: decision trees/lists and monomials/polynomials. We also provide strong feature-efficiency guarantees for these algorithms; in fact, we show that in order to learn future targets, we need only slightly more feature evaluations per training example than what is needed to predict on an arbitrary example using those targets. We also provide algorithms with guarantees in an agnostic model where not all the targets are related to each other. Finally, we also provide lower bounds on the performance of a lifelong learner in these models, which are in fact tight under some conditions.