A call for better unit testing for invariant risk minimisation
This identifies critical flaws in a popular invariant risk minimization method, emphasizing the need for better testing to ensure reliable progress in domain generalization.
The paper demonstrates that the linearized IRM framework (IRMv1) can be unstable under small changes to the optimal regressor, leading to worse generalization to new environments compared to ERM, and highlights scaling issues in the setup.
In this paper we present a controlled study on the linearized IRM framework (IRMv1) introduced in Arjovsky et al. (2020). We show that IRMv1 (and its variants) framework can be potentially unstable under small changes to the optimal regressor. This can, notably, lead to worse generalisation to new environments, even compared with ERM which converges simply to the global minimum for all training environments mixed up all together. We also highlight the isseus of scaling in the the IRMv1 setup. These observations highlight the importance of rigorous evaluation and importance of unit-testing for measuring progress towards IRM.