CYAug 27, 2022Code
Conversion of Legal Agreements into Smart Legal Contracts using NLPEason Chen, Niall Roche, Yuen-Hsien Tseng et al.
A Smart Legal Contract (SLC) is a specialized digital agreement comprising natural language and computable components. The Accord Project provides an open-source SLC framework containing three main modules: Cicero, Concerto, and Ergo. Currently, we need lawyers, programmers, and clients to work together with great effort to create a usable SLC using the Accord Project. This paper proposes a pipeline to automate the SLC creation process with several Natural Language Processing (NLP) models to convert law contracts to the Accord Project's Concerto model. After evaluating the proposed pipeline, we discovered that our NER pipeline accurately detects CiceroMark from Accord Project template text with an accuracy of 0.8. Additionally, our Question Answering method can extract one-third of the Concerto variables from the template text. We also delve into some limitations and possible future research for the proposed pipeline. Finally, we describe a web interface enabling users to build SLCs. This interface leverages the proposed pipeline to convert text documents to Smart Legal Contracts by using NLP models.
IRAug 23, 2023
Evolution of ESG-focused DLT Research: An NLP Analysis of the LiteratureWalter Hernandez Cruz, Kamil Tylinski, Alastair Moore et al.
Distributed Ledger Technology (DLT) faces increasing environmental scrutiny, particularly concerning the energy consumption of the Proof of Work (PoW) consensus mechanism and broader Environmental, Social, and Governance (ESG) issues. However, existing systematic literature reviews of DLT rely on limited analyses of citations, abstracts, and keywords, failing to fully capture the field's complexity and ESG concerns. We address these challenges by analyzing the full text of 24,539 publications using Natural Language Processing (NLP) with our manually labeled Named Entity Recognition (NER) dataset of 39,427 entities for DLT. This methodology identified 505 key publications at the DLT/ESG intersection, enabling comprehensive domain analysis. Our combined NLP and temporal graph analysis reveals critical trends in DLT evolution and ESG impacts, including cryptography and peer-to-peer networks research's foundational influence, Bitcoin's persistent impact on research and environmental concerns (a "Lindy effect"), Ethereum's catalytic role on Proof of Stake (PoS) and smart contract adoption, and the industry's progressive shift toward energy-efficient consensus mechanisms. Our contributions include the first DLT-specific NER dataset addressing the scarcity of high-quality labeled NLP data in blockchain research, a methodology integrating NLP and temporal graph analysis for large-scale interdisciplinary literature reviews, and the first NLP-driven literature review focusing on DLT's ESG aspects.
AIOct 2, 2025Code
Orchestrating Human-AI Teams: The Manager Agent as a Unifying Research ChallengeCharlie Masters, Advaith Vellanki, Jiangbo Shangguan et al.
While agentic AI has advanced in automating individual tasks, managing complex multi-agent workflows remains a challenging problem. This paper presents a research vision for autonomous agentic systems that orchestrate collaboration within dynamic human-AI teams. We propose the Autonomous Manager Agent as a core challenge: an agent that decomposes complex goals into task graphs, allocates tasks to human and AI workers, monitors progress, adapts to changing conditions, and maintains transparent stakeholder communication. We formalize workflow management as a Partially Observable Stochastic Game and identify four foundational challenges: (1) compositional reasoning for hierarchical decomposition, (2) multi-objective optimization under shifting preferences, (3) coordination and planning in ad hoc teams, and (4) governance and compliance by design. To advance this agenda, we release MA-Gym, an open-source simulation and evaluation framework for multi-agent workflow orchestration. Evaluating GPT-5-based Manager Agents across 20 workflows, we find they struggle to jointly optimize for goal completion, constraint adherence, and workflow runtime - underscoring workflow management as a difficult open problem. We conclude with organizational and ethical implications of autonomous management systems.
91.5HCMar 26
Toward a Human-AI Task Tensor: A Taxonomy for Organizing Work in the Age of Generative AIAnil R. Doshi, Alastair Moore
We introduce a framework for understanding the impact of generative AI on human work, which we call the human-AI task tensor. A tensor is a structured framework that organizes tasks along multiple interdependent dimensions. Our human-AI task tensor introduces a systematic approach to studying how humans and AI interact to perform tasks, and has eight dimensions: task definition, AI integration, interaction modality, audit requirement, output definition, decision-making authority, AI structure, and human persona. After describing the eight dimensions of the tensor, we provide illustrative frameworks (derived from projections of the tensor) and a human-AI task canvas that provide analytical tractability and practical insight for organizational decision-making. We demonstrate how the human-AI task tensor can be used to organize emerging and future research on generative AI. We propose that the human-AI task tensor offers a starting point for understanding how work will be performed with the emergence of generative AI.
CLApr 18, 2021
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order SensitivityYao Lu, Max Bartolo, Alastair Moore et al.
When primed with only a handful of training samples, very large, pretrained language models such as GPT-3 have shown competitive results when compared to fully-supervised, fine-tuned, large, pretrained language models. We demonstrate that the order in which the samples are provided can make the difference between near state-of-the-art and random guess performance: essentially some permutations are "fantastic" and some not. We analyse this phenomenon in detail, establishing that: it is present across model sizes (even for the largest current models), it is not related to a specific subset of samples, and that a given good permutation for one model is not transferable to another. While one could use a development set to determine which permutations are performant, this would deviate from the true few-shot setting as it requires additional annotated data. Instead, we use the generative nature of language models to construct an artificial development set and based on entropy statistics of the candidate permutations on this set, we identify performant prompts. Our method yields a 13% relative improvement for GPT-family models across eleven different established text classification tasks.