AISep 26, 2024Code
Data-Prep-Kit: getting your data ready for LLM application developmentDavid Wood, Boris Lublinsky, Alexy Roytman et al.
Data preparation is the first and a very important step towards any Large Language Model (LLM) development. This paper introduces an easy-to-use, extensible, and scale-flexible open-source data preparation toolkit called Data Prep Kit (DPK). DPK is architected and designed to enable users to scale their data preparation to their needs. With DPK they can prepare data on a local machine or effortlessly scale to run on a cluster with thousands of CPU Cores. DPK comes with a highly scalable, yet extensible set of modules that transform natural language and code data. If the user needs additional transforms, they can be easily developed using extensive DPK support for transform creation. These modules can be used independently or pipelined to perform a series of operations. In this paper, we describe DPK architecture and show its performance from a small scale to a very large number of CPUs. The modules from DPK have been used for the preparation of Granite Models [1] [2]. We believe DPK is a valuable contribution to the AI community to easily prepare data to enhance the performance of their LLM models or to fine-tune models with Retrieval-Augmented Generation (RAG).
CLApr 21
STaD: Scaffolded Task Design for Identifying Compositional Skill Gaps in LLMsSungeun An, Swanand Ravindra Kadhe, Shailja Thakur et al.
Benchmarks are often used as a standard to understand LLM capabilities in different domains. However, aggregate benchmark scores provide limited insight into compositional skill gaps of LLMs and how to improve them. To make these weaknesses visible, we propose Scaffolded Task Design (STaD) framework. STaD generates controlled variations of benchmark tasks based on the concept of scaffolding, which introduces structured, incremental support in a step-by-step manner. Rather than inspecting failures individually, this approach enables systematic and scalable probing of model behavior by identifying the specific reasoning skill compositions they lack. Treating the LLM as a black box, our experiments on six models of varying sizes reveal multiple failure points in three reasoning benchmarks and highlight each model's unique and distinct skill gaps.
HCJun 6, 2022
Understanding Self-Directed Learning in an Online LaboratorySungeun An, Spencer Rugaber, Jennifer Hammock et al.
We described a study on the use of an online laboratory for self-directed learning by constructing and simulating conceptual models of ecological systems. In this study, we could observe only the modeling behaviors and outcomes; the learning goals and outcomes were unknown. We used machine learning techniques to analyze the modeling behaviors of 315 learners and 822 conceptual models they generated. We derive three main conclusions from the results. First, learners manifest three types of modeling behaviors: observation (simulation focused), construction (construction focused), and full exploration (model construction, evaluation and revision). Second, while observation was the most common behavior among all learners, construction without evaluation was more common for less engaged learners and full exploration occurred mostly for more engaged learners. Third, learners who explored the full cycle of model construction, evaluation and revision generated models of higher quality. These modeling behaviors provide insights into self-directed learning at large.
AIApr 24
A Systematic Approach for Large Language Models DebuggingBasel Shbita, Anna Lisa Gentile, Bing Zhang et al.
Large language models (LLMs) have become central to modern AI workflows, powering applications from open-ended text generation to complex agent-based reasoning. However, debugging these models remains a persistent challenge due to their opaque and probabilistic nature and the difficulty of diagnosing errors across diverse tasks and settings. This paper introduces a systematic approach for LLM debugging that treats models as observable systems, providing structured, model-agnostic methods from issue detection to model refinement. By unifying evaluation, interpretability, and error-analysis practices, our approach enables practitioners to iteratively diagnose model weaknesses, refine prompts and model parameters, and adapt data for fine-tuning or assessment, while remaining effective in contexts where standardized benchmarks and evaluation criteria are lacking. We argue that such a structured methodology not only accelerates troubleshooting but also fosters reproducibility, transparency, and scalability in the deployment of LLM-based systems.
AIDec 11, 2023
Using Analytics on Student Created Data to Content Validate Pedagogical ToolsJohn Kos, Kenneth Eaton, Sareen Zhang et al.
Conceptual and simulation models can function as useful pedagogical tools, however it is important to categorize different outcomes when evaluating them in order to more meaningfully interpret results. VERA is a ecology-based conceptual modeling software that enables users to simulate interactions between biotics and abiotics in an ecosystem, allowing users to form and then verify hypothesis through observing a time series of the species populations. In this paper, we classify this time series into common patterns found in the domain of ecological modeling through two methods, hierarchical clustering and curve fitting, illustrating a general methodology for showing content validity when combining different pedagogical tools. When applied to a diverse sample of 263 models containing 971 time series collected from three different VERA user categories: a Georgia Tech (GATECH), North Georgia Technical College (NGTC), and ``Self Directed Learners'', results showed agreement between both classification methods on 89.38\% of the sample curves in the test set. This serves as a good indication that our methodology for determining content validity was successful.
AIDec 16, 2021
Explanation as Question Answering based on Design KnowledgeAshok Goel, Vrinda Nandan, Eric Gregori et al.
Explanation of an AI agent requires knowledge of its design and operation. An open question is how to identify, access and use this design knowledge for generating explanations. Many AI agents used in practice, such as intelligent tutoring systems fielded in educational contexts, typically come with a User Guide that explains what the agent does, how it works and how to use the agent. However, few humans actually read the User Guide in detail. Instead, most users seek answers to their questions on demand. In this paper, we describe a question answering agent (AskJill) that uses the User Guide for an interactive learning environment (VERA) to automatically answer questions and thereby explains the domain, functioning, and operation of VERA. We present a preliminary assessment of AskJill in VERA.
CYMar 30, 2020
Using VERA to explain the impact of social distancing on the spread of COVID-19William Broniec, Sungeun An, Spencer Rugaber et al.
COVID-19 continues to spread across the country and around the world. Current strategies for managing the spread of COVID-19 include social distancing. We present VERA, an interactive AI tool, that first enables users to specify conceptual models of the impact of social distancing on the spread of COVID-19. Then, VERA automatically spawns agent-based simulations from the conceptual models, and, given a data set, automatically fills in the values of the simulation parameters from the data. Next, the user can view the simulation results, and, if needed, revise the simulation parameters and run another experimental trial, or build an alternative conceptual model. We describe the use VERA to develop a SIR model for the spread of COVID-19 and its relationship with healthcare capacity.