Vaishali Vinay

AINov 25, 2025

Failure Modes in LLM Systems: A System-Level Taxonomy for Reliable AI Applications

Vaishali Vinay

Large language models (LLMs) are being rapidly integrated into decision-support tools, automation workflows, and AI-enabled software systems. However, their behavior in production environments remains poorly understood, and their failure patterns differ fundamentally from those of traditional machine learning models. This paper presents a system-level taxonomy of fifteen hidden failure modes that arise in real-world LLM applications, including multi-step reasoning drift, latent inconsistency, context-boundary degradation, incorrect tool invocation, version drift, and cost-driven performance collapse. Using this taxonomy, we analyze the growing gap in evaluation and monitoring practices: existing benchmarks measure knowledge or reasoning but provide little insight into stability, reproducibility, drift, or workflow integration. We further examine the production challenges associated with deploying LLMs - including observability limitations, cost constraints, and update-induced regressions - and outline high-level design principles for building reliable, maintainable, and cost-aware LLM systems. Finally, we outline high-level design principles for building reliable, maintainable, and cost-aware LLM-based systems. By framing LLM reliability as a system-engineering problem rather than a purely model-centric one, this work provides an analytical foundation for future research on evaluation methodology, AI system robustness, and dependable LLM deployment.

CRDec 5, 2024

SCADE: Scalable Framework for Anomaly Detection in High-Performance System

Vaishali Vinay, Anjali Mangal · microsoft-research

As command-line interfaces remain integral to high-performance computing environments, the risk of exploitation through stealthy and complex command-line abuse grows. Conventional security solutions struggle to detect these anomalies due to their context-specific nature, lack of labeled data, and the prevalence of sophisticated attacks like Living-off-the-Land (LOL). To address this gap, we introduce the Scalable Command-Line Anomaly Detection Engine (SCADE), a framework that combines global statistical models with local context-specific analysis for unsupervised anomaly detection. SCADE leverages novel statistical methods, including BM25 and Log Entropy, alongside dynamic thresholding to adaptively detect rare, malicious command-line patterns in low signal-to-noise ratio (SNR) environments. Experimental results show that SCADE achieves above 98% SNR in identifying anomalous behavior while minimizing false positives. Designed for scalability and precision, SCADE provides an innovative, metadata-enriched approach to anomaly detection, offering a robust solution for cybersecurity in high-computation environments. This work presents SCADE's architecture, detection methodology, and its potential for enhancing anomaly detection in enterprise systems. We argue that SCADE represents a significant advancement in unsupervised anomaly detection, offering a robust, adaptive framework for security analysts and researchers seeking to enhance detection accuracy in high-computation environments.

Vaishali Vinay

2 Papers