LGAIMay 29, 2025

BIRD: Behavior Induction via Representation-structure Distillation

arXiv:2505.23933v1h-index: 5
Originality Incremental advance
AI Analysis

This addresses the problem of scalable and safe AI deployment by enabling efficient transfer of aligned behaviors, though it is incremental as it builds on existing distillation and alignment methods.

The paper tackles the challenge of transferring human-aligned behaviors like robustness and fairness from a teacher model to a student model across different tasks or data distributions, achieving up to 16% improvement in robust accuracy over baselines in image classification and explaining 85% of transfer success variance with interpretable representation properties.

Human-aligned deep learning models exhibit behaviors consistent with human values, such as robustness, fairness, and honesty. Transferring these behavioral properties to models trained on different tasks or data distributions remains challenging: aligned behavior is easily forgotten during fine-tuning, and collecting task-specific data that preserves this behavior can be prohibitively costly. We introduce BIRD (Behavior Induction via Representation-structure Distillation), a flexible framework for transferring aligned behavior by matching the internal representation structure of a student model to that of a teacher. Applied to out-of-distribution robustness in image classification, BIRD outperforms fine-tuning, transfer learning, and continual learning methods, improving robust accuracy by up to 16% over the next strongest baseline. It remains effective even when the teacher is trained on a much simpler dataset and is $25 \times$ smaller than the student. In a large-scale study of over 400 teacher-student pairs, we show that three interpretable and computable properties of the teacher's representations (i.e., task relevance, behavioral relevance, and complementary knowledge) explain up to 85% of the variance in transfer success. These insights offer practical guidance for teacher selection and design. BIRD turns small, well-aligned models into scalable alignment seeds, removing a key bottleneck in deploying safe AI systems in the wild.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes