Anna Leontjeva

h-index7

8papers

395citations

Novelty44%

AI Score48

Ranked #26,902 of 194,257 authors (top 14%)#6,406 in LG (top 16%)

8 Papers

14.4SEJul 16

Beyond Generalist LLMs: Specialist Agentic Systems for Structured Code Workflow Execution

Harris Borman, Herman Wandabwa, Fusun Yu et al.

Large Language Models (LLMs) have accelerated the adoption of software development agents, now widely available as Integrated Development Environment (IDE) extensions and standalone applications. While these agents are typically general-purpose, it remains unclear whether specialist agents justify their additional development effort. We investigate this question in the context of business process automation, focusing on the transformation of Business Process Model and Notation (BPMN) diagrams into executable agentic workflows. Since BPMN specifies explicit control-flow semantics, we focus on deterministic workflows in which a fixed process model and inputs uniquely determine the executed path. We introduce a specialist workflow for this task and compare it against generalist agents such as Roo and Cline. Our results show that the specialist solution produces agents that outperform generalist baselines by approximately 9-20 percentage points in tool-use exactness, 2-4x in penalty-adjusted latency, and 3x fewer tool-call errors, while reducing generation token cost by over 95% and eliminating repair iterations. We also find that generalist agents generate code inconsistently in both functionality and quality, limiting their suitability for industrial settings where reliability and maintainability are essential.

0.5CLMar 10, 2023

Detection of Abuse in Financial Transaction Descriptions Using Machine Learning

Anna Leontjeva, Genevieve Richards, Kaavya Sriskandaraja et al.

Since introducing changes to the New Payments Platform (NPP) to include longer messages as payment descriptions, it has been identified that people are now using it for communication, and in some cases, the system was being used as a targeted form of domestic and family violence. This type of tech-assisted abuse poses new challenges in terms of identification, actions and approaches to rectify this behaviour. Commonwealth Bank of Australia's Artificial Intelligence Labs team (CBA AI Labs) has developed a new system using advances in deep learning models for natural language processing (NLP) to create a powerful abuse detector that periodically scores all the transactions, and identifies cases of high-risk abuse in millions of records. In this paper, we describe the problem of tech-assisted abuse in the context of banking services, outline the developed model and its performance, and the operating framework more broadly.

4.9CLMay 19

EmbGen: Teaching with Reassembled Corpora

Arun K Lenin, Kai Rouse, Andrea Nicastro et al.

Adapting small instruction-tuned models to specialized domains often relies on supervised fine-tuning (SFT) on curated instruction-response examples, which is expensive to collect at scale. Synthetic training examples generated by a teacher LLM from a domain corpus can reduce this cost, but existing pipelines can produce homogenized outputs and do not consistently capture cross-passage or cross-document dependencies. We introduce EmbGen, a synthetic data generation pipeline that decomposes a corpus into entity-description pairs, reassembles them using semantic structure inferred from embedding similarity, and then generates question-answer (QA) pairs via proximity, intra-cluster, and inter-cluster sampling with cluster-specialized system prompts. We evaluate EmbGen against EntiGraph, InstructLab and Knowledge-Instruct on three datasets of varied semantic heterogeneity, under fixed token budgets (5 and 20 million tokens). We use lexical overlap metrics, an LLM-as-a-judge rubric, and Binary Accuracy, a composed metric combining Factual Accuracy and Completeness for evaluation. EmbGen improves Binary Accuracy on the most heterogeneous dataset by 12.5% at 5M and 88.9% at 20M tokens budget, relative to the strongest baseline, while remaining competitive across other datasets with lower heterogeneity.

9.6AIMay 14

Prompt Segmentation and Annotation Optimisation: Controlling LLM Behaviour via Optimised Segment-Level Annotations

Devika Prasad, Luke Gerschwitz, Tong Li et al.

Prompt engineering is crucial for effective interaction with generative artificial intelligence systems, yet existing optimisation methods often operate over an unstructured and vast prompt space, leading to high computational costs and potential distortions of the original intent. We introduce Prompt Segmentation and Annotation Optimisation (PSAO), a structured prompt optimisation framework designed to improve prompt optimisation controllability and efficiency. PSAO decomposes a prompt into interpretable segments (e.g., sentences) and augments each with human-readable annotations (e.g., {not important}, {important}, {very important}). These annotations guide large language models (LLMs) in allocating focus and clarifying confusion during response generation. We formally define the segmentations and annotations and demonstrate that optimised segment-level annotations can lead to improved LLM responses, with the original prompt retained as a candidate in the optimisation space to prevent performance degradation. Empirical evaluations indicate that PSAO benefits from annotations in terms of improved reasoning accuracy and self-consistency. However, developing efficient methods for identifying optimal segmentations and annotations remains challenging and is reserved for future investigation. This work is intended as a proof of concept, demonstrating the feasibility and potential of segment-level annotation optimisation.

4.1LGAug 1, 2025

FeatureCuts: Feature Selection for Large Data by Optimizing the Cutoff

Andy Hu, Devika Prasad, Luiz Pizzato et al.

In machine learning, the process of feature selection involves finding a reduced subset of features that captures most of the information required to train an accurate and efficient model. This work presents FeatureCuts, a novel feature selection algorithm that adaptively selects the optimal feature cutoff after performing filter ranking. Evaluated on 14 publicly available datasets and one industry dataset, FeatureCuts achieved, on average, 15 percentage points more feature reduction and up to 99.6% less computation time while maintaining model performance, compared to existing state-of-the-art methods. When the selected features are used in a wrapper method such as Particle Swarm Optimization (PSO), it enables 25 percentage points more feature reduction, requires 66% less computation time, and maintains model performance when compared to PSO alone. The minimal overhead of FeatureCuts makes it scalable for large datasets typically seen in enterprise applications.

1.2STOct 28, 2024

Do LLM Personas Dream of Bull Markets? Comparing Human and AI Investment Strategies Through the Lens of the Five-Factor Model

Harris Borman, Anna Leontjeva, Luiz Pizzato et al.

Large Language Models (LLMs) have demonstrated the ability to adopt a personality and behave in a human-like manner. There is a large body of research that investigates the behavioural impacts of personality in less obvious areas such as investment attitudes or creative decision making. In this study, we investigated whether an LLM persona with a specific Big Five personality profile would perform an investment task similarly to a human with the same personality traits. We used a simulated investment task to determine if these results could be generalised into actual behaviours. In this simulated environment, our results show these personas produced meaningful behavioural differences in all assessed categories, with these behaviours generally being consistent with expectations derived from human research. We found that LLMs are able to generalise traits into expected behaviours in three areas: learning style, impulsivity and risk appetite while environmental attitudes could not be accurately represented. In addition, we showed that LLMs produce behaviour that is more reflective of human behaviour in a simulation environment compared to a survey environment.

6.1LGDec 20, 2017Code

Combining Static and Dynamic Features for Multivariate Sequence Classification

Anna Leontjeva, Ilya Kuzovkin

Model precision in a classification task is highly dependent on the feature space that is used to train the model. Moreover, whether the features are sequential or static will dictate which classification method can be applied as most of the machine learning algorithms are designed to deal with either one or another type of data. In real-life scenarios, however, it is often the case that both static and dynamic features are present, or can be extracted from the data. In this work, we demonstrate how generative models such as Hidden Markov Models (HMM) and Long Short-Term Memory (LSTM) artificial neural networks can be used to extract temporal information from the dynamic data. We explore how the extracted information can be combined with the static features in order to improve the classification performance. We evaluate the existing techniques and suggest a hybrid approach, which outperforms other methods on several public datasets.

7.7LGDec 12, 2017Code

Temporal Stability in Predictive Process Monitoring

Irene Teinemaa, Marlon Dumas, Anna Leontjeva et al.

Predictive process monitoring is concerned with the analysis of events produced during the execution of a business process in order to predict as early as possible the final outcome of an ongoing case. Traditionally, predictive process monitoring methods are optimized with respect to accuracy. However, in environments where users make decisions and take actions in response to the predictions they receive, it is equally important to optimize the stability of the successive predictions made for each case. To this end, this paper defines a notion of temporal stability for binary classification tasks in predictive process monitoring and evaluates existing methods with respect to both temporal stability and accuracy. We find that methods based on XGBoost and LSTM neural networks exhibit the highest temporal stability. We then show that temporal stability can be enhanced by hyperparameter-optimizing random forests and XGBoost classifiers with respect to inter-run stability. Finally, we show that time series smoothing techniques can further enhance temporal stability at the expense of slightly lower accuracy.