Task-Level AI Readiness Assessment for Business Process Management:The T-IPO Model and LARA Matrix in Financial-Services IT Operations

arXiv:2605.162976.8

Predicted impact top 56% in CY · last 90 daysOriginality Incremental advance

AI Analysis

For business process managers and AI practitioners, this provides a more granular task-level readiness assessment than existing activity-level frameworks, though the improvement in predictive utility is suggestive rather than proven.

The paper introduces T-IPO and LARA, two artifacts for assessing LLM agent readiness at the task level in business processes, evaluated on 127 financial-services IT tasks. Results show high inter-rater agreement (Fleiss' κ=0.80) and monotonic auto-completion decay from 95% at L1 to 40% at L3.

Which tasks inside an enterprise workflow can a large-language-model agent reliably handle, and under what conditions? Most business process modeling frameworks still answer this at the activity level, even though a single activity can bundle work of radically different difficulty. This paper takes the analysis a step smaller. We describe two design artifacts developed in a financial-services IT setting: T-IPO, which represents each task as an eight-element tuple, and LARA (LLM Agent Readiness Assessment), a five-dimension rubric that scores a task's readiness for agent substitution. Compliance Sensitivity carries $1.5\times$ weight, a value we fixed through a three-round Delphi study and cross-checked with AHP. The rubric produces four levels, L1 to L4, and applies a floor rule so that a task with maximum compliance load cannot be classified below L3 no matter what the other scores say. Both artifacts sit inside a larger methodology (PARTIS) that we map onto BWW ontology in Section 3. We evaluate the instruments across 127 tasks. Inter-rater agreement reaches Fleiss' $κ= 0.80$; a replication at three further institutions returns $κ= 0.73$. A controlled comparison against activity-level assessment suggests, though does not prove, an improvement in predictive utility at the task level. Pilot deployment of 120 task instances confirms that auto-completion decays monotonically from $95\%$ at L1 through about $70\%$ at L2 to about $40\%$ at L3. Exploratory factor analysis points to a two-factor structure: task readiness seems to be determined jointly by cognitive-execution complexity and governance-compliance intensity. We close with a recalibration procedure (LARA-TCA) so the rubric can keep pace with evolving LLM capabilities.

View on arXiv PDF

Similar