CYFeb 19, 2025
AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommonsShaona Ghosh, Heather Frase, Adina Williams et al. · deepmind, stanford
The rapid advancement and deployment of AI systems have created an urgent need for standard safety-evaluation frameworks. This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability. Its development employed an open process that included participants from multiple fields. The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories, including violent crimes, nonviolent crimes, sex-related crimes, child sexual exploitation, indiscriminate weapons, suicide and self-harm, intellectual property, privacy, defamation, hate, sexual content, and specialized advice (election, financial, health, legal). Our method incorporates a complete assessment standard, extensive prompt datasets, a novel evaluation framework, a grading and reporting system, and the technical as well as organizational infrastructure for long-term support and evolution. In particular, the benchmark employs an understandable five-tier grading scale (Poor to Excellent) and incorporates an innovative entropy-based system-response evaluation. In addition to unveiling the benchmark, this report also identifies limitations of our method and of building safety benchmarks generally, including evaluator uncertainty and the constraints of single-turn interactions. This work represents a crucial step toward establishing global standards for AI risk and reliability evaluation while acknowledging the need for continued development in areas such as multiturn interactions, multimodal understanding, coverage of additional languages, and emerging hazard categories. Our findings provide valuable insights for model developers, system integrators, and policymakers working to promote safer AI deployment.
CRAug 31, 2020
Connecting Web Event Forecasting with Anomaly Detection: A Case Study on Enterprise Web Applications Using Self-Supervised Neural NetworksXiaoyong Yuan, Lei Ding, Malek Ben Salem et al.
Recently web applications have been widely used in enterprises to assist employees in providing effective and efficient business processes. Forecasting upcoming web events in enterprise web applications can be beneficial in many ways, such as efficient caching and recommendation. In this paper, we present a web event forecasting approach, DeepEvent, in enterprise web applications for better anomaly detection. DeepEvent includes three key features: web-specific neural networks to take into account the characteristics of sequential web events, self-supervised learning techniques to overcome the scarcity of labeled data, and sequence embedding techniques to integrate contextual events and capture dependencies among web events. We evaluate DeepEvent on web events collected from six real-world enterprise web applications. Our experimental results demonstrate that DeepEvent is effective in forecasting sequential web events and detecting web based anomalies. DeepEvent provides a context-based system for researchers and practitioners to better forecast web events with situational awareness.
CRAug 15, 2020
Are Smart Home Devices Abandoning IPV Victims?Ahmed Alshehri, Malek Ben Salem, Lei Ding
Smart home devices have brought us many benefits such as advanced security, convenience, and entertainment. However, these devices also have made unintended consequences like giving ultimate power for devices' owners over their intimate partners in the same household which might lead to tech-facilitated domestic abuse (tech-abuse) as recent research has shown. In this paper, we systematize findings on tech-abuse in smart homes. We show that domestic abuse and Intimate Partner Violence (IPV) in smart homes is more effective and less risky for abusers. Victims find it more harmful and more challenging to protect themselves from. We articulate a comprehensive analysis of all the phases of abuse in smart homes and categorize risks and needs in each phase. Technical analysis of current smart home technologies is conducted to shed light upon their limitations. We also summarize recent recommendations to combat tech-abuse in smart homes and focus on their potentials and shortcomings. Unsurprisingly, we find that many recommendations conflict with each other due to a lack of understanding of phases of abuse in smart homes. Desirable properties to design abuse-resistant smart home devices are proposed for all the phases of abuse. The research community benefits from our analysis and recommendations to move forward with a focus on filling the blind spots of existing smart home devices' safety measures and building appropriate safety measures that consider tech-abuse threats in smart homes.