Active Value Querying to Minimize Additive Error in Subadditive Set Function Learning
This work is significant for researchers and practitioners dealing with resource-intensive specification of subadditive set functions, particularly in areas like combinatorial auctions and interpretable machine learning, by providing methods to handle missing values and reduce approximation error.
This paper addresses the challenge of specifying subadditive set functions, which typically requires an exponential number of subset values, by focusing on minimizing additive error when values are missing. The authors explore minimal and maximal completions of set functions and develop methods to reduce the distance between them by actively querying additional subset values.
Subadditive set functions play a pivotal role in computational economics (especially in combinatorial auctions), combinatorial optimization or artificial intelligence applications such as interpretable machine learning. However, specifying a set function requires assigning values to an exponentially large number of subsets in general, a task that is often resource-intensive in practice, particularly when the values derive from external sources such as retraining of machine learning models. A~simple omission of certain values introduces ambiguity that becomes even more significant when the incomplete set function has to be further optimized over. Motivated by the well-known result about inapproximability of subadditive functions using deterministic value queries with respect to a multiplicative error, we study a problem of approximating an unknown subadditive (or a subclass of thereof) set function with respect to an additive error -- i. e., we aim to efficiently close the distance between minimal and maximal completions. Our contributions are threefold: (i) a thorough exploration of minimal and maximal completions of different classes of set functions with missing values and an analysis of their resulting distance; (ii) the development of methods to minimize this distance over classes of set functions with a known prior, achieved by disclosing values of additional subsets in both offline and online manner; and (iii) empirical demonstrations of the algorithms' performance in practical scenarios.