Phillip Dawson

7.7CYMar 20

Aiming for AI Interoperability: Challenges and Opportunities

Benjamin Faveri, Craig Shank, Richard Whitt et al.

The Aiming for AI Interoperability report investigates the ongoing challenge of achieving regulatory and technical AI interoperability as national and global AI governance efforts are proliferating. Here, technical interoperability is the ability of AI systems and networks to function together, and regulatory interoperability is the consistency and overlap of rules across jurisdictions and sectors. This report observes an accelerating trend that many governments, standard-setting bodies, and private firms are drafting, implementing, or passing new AI laws, policies, and frameworks at a staggering pace, resulting in fragmentation and confusion for both private and public sector actors.

13.9CLMay 14, 2025Code

A Comprehensive Analysis of Large Language Model Outputs: Similarity, Diversity, and Bias

Brandon Smith, Mohamed Reda Bouadjenek, Tahsin Alamgir Kheya et al.

Large Language Models (LLMs) represent a major step toward artificial general intelligence, significantly advancing our ability to interact with technology. While LLMs perform well on Natural Language Processing tasks -- such as translation, generation, code writing, and summarization -- questions remain about their output similarity, variability, and ethical implications. For instance, how similar are texts generated by the same model? How does this compare across different models? And which models best uphold ethical standards? To investigate, we used 5{,}000 prompts spanning diverse tasks like generation, explanation, and rewriting. This resulted in approximately 3 million texts from 12 LLMs, including proprietary and open-source systems from OpenAI, Google, Microsoft, Meta, and Mistral. Key findings include: (1) outputs from the same LLM are more similar to each other than to human-written texts; (2) models like WizardLM-2-8x22b generate highly similar outputs, while GPT-4 produces more varied responses; (3) LLM writing styles differ significantly, with Llama 3 and Mistral showing higher similarity, and GPT-4 standing out for distinctiveness; (4) differences in vocabulary and tone underscore the linguistic uniqueness of LLM-generated content; (5) some LLMs demonstrate greater gender balance and reduced bias. These results offer new insights into the behavior and diversity of LLM outputs, helping guide future development and ethical evaluation.

Phillip Dawson

2 Papers