LG AIMar 18, 2025

COPA: Comparing the incomparable in multi-objective model evaluation

Adrián Javaloy, Antonio Vergari, Isabel Valera

arXiv:2503.14321v37.11 citationsh-index: 23

Originality Incremental advance

AI Analysis

This addresses the challenge for ML practitioners in efficiently selecting models from large sets based on diverse, incomparable objectives, offering an automated solution to reduce expert time and improve decision-making.

The paper tackles the problem of comparing and selecting machine learning models across multiple incomparable objectives like accuracy, robustness, fairness, and scalability by proposing COPA, which normalizes objectives using cumulative functions based on relative rankings and aggregates them to match user preferences, enabling systematic navigation of Pareto fronts. It demonstrates COPA's effectiveness in model selection and benchmarking across areas such as fair ML, domain generalization, AutoML, and foundation models, where traditional methods fail.

In machine learning (ML), we often need to choose one among hundreds of trained ML models at hand, based on various objectives such as accuracy, robustness, fairness or scalability. However, it is often unclear how to compare, aggregate and, ultimately, trade-off these objectives, making it a time-consuming task that requires expert knowledge, as objectives may be measured in different units and scales. In this work, we investigate how objectives can be automatically normalized and aggregated to systematically help the user navigate their Pareto front. To this end, we make incomparable objectives comparable using their cumulative functions, approximated by their relative rankings. As a result, our proposed approach, COPA, can aggregate them while matching user-specific preferences, allowing practitioners to meaningfully navigate and search for models in the Pareto front. We demonstrate the potential impact of COPA in both model selection and benchmarking tasks across diverse ML areas such as fair ML, domain generalization, AutoML and foundation models, where classical ways to normalize and aggregate objectives fall short.

View on arXiv PDF

Similar