Darshini Ramiah

h-index20
2papers

2 Papers

AIJan 22
Improving Methodologies for LLM Evaluations Across Global Languages

Akriti Vij, Benjamin Chua, Darshini Ramiah et al.

As frontier AI models are deployed globally, it is essential that their behaviour remains safe and reliable across diverse linguistic and cultural contexts. To examine how current model safeguards hold up in such settings, participants from the International Network for Advanced AI Measurement, Evaluation and Science, including representatives from Singapore, Japan, Australia, Canada, the EU, France, Kenya, South Korea and the UK conducted a joint multilingual evaluation exercise. Led by Singapore AISI, two open-weight models were tested across ten languages spanning high and low resourced groups: Cantonese English, Farsi, French, Japanese, Korean, Kiswahili, Malay, Mandarin Chinese and Telugu. Over 6,000 newly translated prompts were evaluated across five harm categories (privacy, non-violent crime, violent crime, intellectual property and jailbreak robustness), using both LLM-as-a-judge and human annotation. The exercise shows how safety behaviours can vary across languages. These include differences in safeguard robustness across languages and harm types and variation in evaluator reliability (LLM-as-judge vs. human review). Further, it also generated methodological insights for improving multilingual safety evaluations, such as the need for culturally contextualised translations, stress-tested evaluator prompts and clearer human annotation guidelines. This work represents an initial step toward a shared framework for multilingual safety testing of advanced AI systems and calls for continued collaboration with the wider research community and industry.

CYOct 22, 2025
Ask What Your Country Can Do For You: Towards a Public Red Teaming Model

Wm. Matthew Kennedy, Cigdem Patlak, Jayraj Dave et al.

AI systems have the potential to produce both benefits and harms, but without rigorous and ongoing adversarial evaluation, AI actors will struggle to assess the breadth and magnitude of the AI risk surface. Researchers from the field of systems design have developed several effective sociotechnical AI evaluation and red teaming techniques targeting bias, hate speech, mis/disinformation, and other documented harm classes. However, as increasingly sophisticated AI systems are released into high-stakes sectors (such as education, healthcare, and intelligence-gathering), our current evaluation and monitoring methods are proving less and less capable of delivering effective oversight. In order to actually deliver responsible AI and to ensure AI's harms are fully understood and its security vulnerabilities mitigated, pioneering new approaches to close this "responsibility gap" are now more urgent than ever. In this paper, we propose one such approach, the cooperative public AI red-teaming exercise, and discuss early results of its prior pilot implementations. This approach is intertwined with CAMLIS itself: the first in-person public demonstrator exercise was held in conjunction with CAMLIS 2024. We review the operational design and results of this exercise, the prior National Institute of Standards and Technology (NIST)'s Assessing the Risks and Impacts of AI (ARIA) pilot exercise, and another similar exercise conducted with the Singapore Infocomm Media Development Authority (IMDA). Ultimately, we argue that this approach is both capable of delivering meaningful results and is also scalable to many AI developing jurisdictions.