On the statistical evaluation of algorithmic's computational experimentation with infeasible solutions
This work addresses a methodological bottleneck for researchers evaluating algorithms in optimization, particularly for nonconvex mixed-integer nonlinear problems, though it is incremental as it builds on existing ranking and statistical test frameworks.
The authors tackled the problem of statistically evaluating algorithm performance when data are non-normal, heteroscedastic, or have missing entries due to infeasible solutions, by proposing a bi-objective lexicographical ranking scheme. Using this scheme on benchmark problems, they found that the Iterative Rounding heuristic significantly outperformed the Feasibility Pump heuristic but was outperformed by the RECIPE heuristic.
The experimental evaluation of algorithms results in a large set of data which generally do not follow a normal distribution or are not heteroscedastic. Besides, some of its entries may be missing, due to the inability of an algorithm to find a feasible solution until a time limit is met. Those characteristics restrict the statistical evaluation of computational experiments. This work proposes a bi-objective lexicographical ranking scheme to evaluate datasets with such characteristics. The output ranking can be used as input to any desired statistical test. We used the proposed ranking scheme to assess the results obtained by the Iterative Rounding heuristic (IR). A Friedman's test and a subsequent post-hoc test carried out on the ranked data demonstrated that IR performed significantly better than the Feasibility Pump heuristic when solving 152 benchmark problems of Nonconvex Mixed-Integer Nonlinear Problems. However, is also showed that the RECIPE heuristic was significantly better than IR when solving the same benchmark problems.