LGOct 11, 2022

Synthetic Model Combination: An Instance-wise Approach to Unsupervised Ensemble Learning

arXiv:2210.05320v14.65 citationsh-index: 69Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge for stakeholders in fields like finance and medicine who need to combine private models without sharing data, offering a more flexible approach than global methods.

The paper tackles the problem of unsupervised ensemble learning when only models and their predictions are available, not the training data, by proposing an instance-wise ensembling method that weights models based on their domain relevance for each instance. It demonstrates the method's effectiveness on classical tasks and a real-world pharmacological use case, achieving improvements such as a 15% reduction in prediction error in the vancomycin dosing scenario.

Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data - instead given access to a set of expert models and their predictions alongside some limited information about the dataset used to train them. In scenarios from finance to the medical sciences, and even consumer practice, stakeholders have developed models on private data they either cannot, or do not want to, share. Given the value and legislation surrounding personal information, it is not surprising that only the models, and not the data, will be released - the pertinent question becoming: how best to use these models? Previous work has focused on global model selection or ensembling, with the result of a single final model across the feature space. Machine learning models perform notoriously poorly on data outside their training domain however, and so we argue that when ensembling models the weightings for individual instances must reflect their respective domains - in other words models that are more likely to have seen information on that instance should have more attention paid to them. We introduce a method for such an instance-wise ensembling of models, including a novel representation learning step for handling sparse high-dimensional domains. Finally, we demonstrate the need and generalisability of our method on classical machine learning tasks as well as highlighting a real world use case in the pharmacological setting of vancomycin precision dosing.

View on arXiv PDF Code

Similar