The SaTML '24 CNN Interpretability Competition: New Innovations for Concept-Level Interpretability
This competition addresses the challenge of reliably diagnosing trojans in CNNs for improved AI oversight, though it is incremental as it builds on existing benchmarks.
The SaTML 2024 CNN Interpretability Competition tackled the problem of helping human crowd-workers identify trojans in CNNs at ImageNet scale, resulting in new techniques and a new record on the benchmark from Casper et al., 2023.
Interpretability techniques are valuable for helping humans understand and oversee AI systems. The SaTML 2024 CNN Interpretability Competition solicited novel methods for studying convolutional neural networks (CNNs) at the ImageNet scale. The objective of the competition was to help human crowd-workers identify trojans in CNNs. This report showcases the methods and results of four featured competition entries. It remains challenging to help humans reliably diagnose trojans via interpretability tools. However, the competition's entries have contributed new techniques and set a new record on the benchmark from Casper et al., 2023.