Landseer: Exploring the Machine Learning Defense Landscape
For ML practitioners and researchers, this work provides a tool to systematically evaluate and compose defenses, addressing the underexplored problem of defense composition in real-world deployments.
Landseer is a modular framework for composing and evaluating machine learning defenses, addressing the lack of systematic study of defense composition. A preliminary study of 35 state-of-the-art defenses revealed gaps in replicability and provided insights into the challenges of integrating multiple defenses.
Machine learning systems face diverse threats that undermine robustness, privacy, and fairness. Although many defenses have been proposed, each typically addresses a single risk in isolation. Real-world deployments, however, require these defenses to be composed to meet multiple guarantees simultaneously. The process of composing defenses is complex and not well understood, and its impact on performance and security remains unclear. We present Landseer, a modular framework for integrating machine learning (ML) defenses into the ML lifecycle and systematically evaluating their composition. Landseer encapsulates defenses as containerized modules, allowing existing and new techniques to be plugged in with minimal effort. Its evaluation engine automates experiments across multiple metrics, supporting the study of defenses both individually and in combination. In a preliminary study, we identified 35 state-of-the-art machine learning defenses. After filtering for reproducibility, we analyzed their performance using Landseer's unified evaluation process. Our findings reveal gaps in replicability across defense families and provide insights into the challenges and opportunities in integrating multiple defenses, establishing a foundation for improving the reliability of machine learning systems.