Reward-Punishment Symmetric Universal Intelligence
This work addresses a theoretical problem in AI intelligence measurement, but it is incremental as it builds on existing frameworks.
The paper extends the Legg-Hutter agent-environment framework to include punishments, showing that under certain Kolmogorov complexity symmetries, the intelligence measure becomes symmetric about the origin, implying reward-ignoring agents have intelligence 0.
Can an agent's intelligence level be negative? We extend the Legg-Hutter agent-environment framework to include punishments and argue for an affirmative answer to that question. We show that if the background encodings and Universal Turing Machine (UTM) admit certain Kolmogorov complexity symmetries, then the resulting Legg-Hutter intelligence measure is symmetric about the origin. In particular, this implies reward-ignoring agents have Legg-Hutter intelligence 0 according to such UTMs.