CLJul 10, 2025

Audit, Alignment, and Optimization of LM-Powered Subroutines with Application to Public Comment Processing

Reilly Raab, Mike Parker, Dan Nally, Sadie Montgomery, Anastasia Bernat, Sai Munikoti, Sameera Horawalavithana

arXiv:2507.08109v1h-index: 10

Originality Synthesis-oriented

AI Analysis

This addresses safety and transparency concerns for adopting LMs in critical domains like environmental policy, though it is incremental in applying existing methods to a new application.

The paper tackles the challenge of responsibly leveraging language models in real-world tasks by proposing a framework for LM-powered subroutines that are auditable and improvable with human feedback, and it demonstrates this in public comment processing with quantitative evaluation against historical ground-truth data.

The advent of language models (LMs) has the potential to dramatically accelerate tasks that may be cast to text-processing; however, real-world adoption is hindered by concerns regarding safety, explainability, and bias. How can we responsibly leverage LMs in a transparent, auditable manner -- minimizing risk and allowing human experts to focus on informed decision-making rather than data-processing or prompt engineering? In this work, we propose a framework for declaring statically typed, LM-powered subroutines (i.e., callable, function-like procedures) for use within conventional asynchronous code -- such that sparse feedback from human experts is used to improve the performance of each subroutine online (i.e., during use). In our implementation, all LM-produced artifacts (i.e., prompts, inputs, outputs, and data-dependencies) are recorded and exposed to audit on demand. We package this framework as a library to support its adoption and continued development. While this framework may be applicable across several real-world decision workflows (e.g., in healthcare and legal fields), we evaluate it in the context of public comment processing as mandated by the 1969 National Environmental Protection Act (NEPA): Specifically, we use this framework to develop "CommentNEPA," an application that compiles, organizes, and summarizes a corpus of public commentary submitted in response to a project requiring environmental review. We quantitatively evaluate the application by comparing its outputs (when operating without human feedback) to historical ``ground-truth'' data as labelled by human annotators during the preparation of official environmental impact statements.

View on arXiv PDF

Similar