Anastasiia Birillo

SE
h-index18
8papers
87citations
Novelty34%
AI Score47

8 Papers

23.0CYJun 2
AI-Generated Traces for Novice Programmers: Learning Effects and Learner Differences in a Multi-Institutional Study

Yuri Noviello, Naaz Sibia, Anastasiia Birillo et al.

Introductory programming (CS1) courses often struggle to support students' understanding of program execution. While visualizations can make execution processes explicit, their effectiveness depends on design and context, and empirical evidence for AI-generated visualizations remains limited. We propose Generated Animated Traces (GATs), AI-generated, analogy-based, narrated animations that coordinate source code, execution state, and conceptual analogies. We conduct a study at two institutions in CS1 courses (Python, N=961; Java N=151) comparing GATs to textual explanations. We measure immediate learning performance and experience, end-of-course engagement and exam performance. Results show that GATs can yield selective benefits for immediate learning, but benefits are context-dependent and short-term. We observe that GATs' influence on performance is moderated by learner engagement profiles. This finding underscores the importance of personalized approaches.

SENov 9, 2025
Understanding Student Interaction with AI-Powered Next-Step Hints: Strategies and Challenges

Anastasiia Birillo, Aleksei Rostovskii, Yaroslav Golubev et al.

Automated feedback generation plays a crucial role in enhancing personalized learning experiences in computer science education. Among different types of feedback, next-step hint feedback is particularly important, as it provides students with actionable steps to progress towards solving programming tasks. This study investigates how students interact with an AI-driven next-step hint system in an in-IDE learning environment. We gathered and analyzed a dataset from 34 students solving Kotlin tasks, containing detailed hint interaction logs. We applied process mining techniques and identified 16 common interaction scenarios. Semi-structured interviews with 6 students revealed strategies for managing unhelpful hints, such as adapting partial hints or modifying code to generate variations of the same hint. These findings, combined with our publicly available dataset, offer valuable opportunities for future research and provide key insights into student behavior, helping improve hint design for enhanced learning support.

SEOct 11, 2024Code
One Step at a Time: Combining LLMs and Static Analysis to Generate Next-Step Hints for Programming Tasks

Anastasiia Birillo, Elizaveta Artser, Anna Potriasaeva et al.

Students often struggle with solving programming problems when learning to code, especially when they have to do it online, with one of the most common disadvantages of working online being the lack of personalized help. This help can be provided as next-step hint generation, i.e., showing a student what specific small step they need to do next to get to the correct solution. There are many ways to generate such hints, with large language models (LLMs) being among the most actively studied right now. While LLMs constitute a promising technology for providing personalized help, combining them with other techniques, such as static analysis, can significantly improve the output quality. In this work, we utilize this idea and propose a novel system to provide both textual and code hints for programming tasks. The pipeline of the proposed approach uses a chain-of-thought prompting technique and consists of three distinct steps: (1) generating subgoals - a list of actions to proceed with the task from the current student's solution, (2) generating the code to achieve the next subgoal, and (3) generating the text to describe this needed action. During the second step, we apply static analysis to the generated code to control its size and quality. The tool is implemented as a modification to the open-source JetBrains Academy plugin, supporting students in their in-IDE courses. To evaluate our approach, we propose a list of criteria for all steps in our pipeline and conduct two rounds of expert validation. Finally, we evaluate the next-step hints in a classroom with 14 students from two universities. Our results show that both forms of the hints - textual and code - were helpful for the students, and the proposed system helped them to proceed with the coding tasks.

39.3CYApr 15
ANVIL: Analogies and Videos for Lecturers

Yuri Noviello, Anastasiia Birillo, Gosia Migut

We present ANVIL, a multimodal generative system that automates the production of analogy-based instructional animations for computer science topics. Given a concept definition, ANVIL generates a textual analogy, compiles it into a structured visual screenplay, and produces executable manim code to render an animation, with an automated repair mechanism to improve robustness. Evaluating such systems at scale requires balancing pedagogical validity with scalability. We begin with a teacher evaluation to ground the quality assessment and use its findings to guide automated screening. For textual analogies, we introduce an LLM-based evaluator for scalable quality screening; for videos, where subjective judgments are difficult to automate, we instead assess fidelity to the intended screenplay using an automated proxy for auditing and error analysis. We further conduct a user study with educators to examine adoption requirements and risks. Our findings suggest that ANVIL can produce materials that are frequently rated as adequate, and that educators respond positively to its perceived value and usability.

SEJan 5
Enhancing Debugging Skills with AI-Powered Assistance: A Real-Time Tool for Debugging Support

Elizaveta Artser, Daniil Karol, Anna Potriasaeva et al.

Debugging is a crucial skill in programming education and software development, yet it is often overlooked in CS curricula. To address this, we introduce an AI-powered debugging assistant integrated into an IDE. It offers real-time support by analyzing code, suggesting breakpoints, and providing contextual hints. Using RAG with LLMs, program slicing, and custom heuristics, it enhances efficiency by minimizing LLM calls and improving accuracy. A three-level evaluation - technical analysis, UX study, and classroom tests - highlights its potential for teaching debugging.

SEFeb 12, 2022
Reflekt: a Library for Compile-Time Reflection in Kotlin

Anastasiia Birillo, Elena Lyulina, Maria Malysheva et al.

Reflection in Kotlin is a powerful mechanism to introspect program behavior during its execution at run-time. However, among the variety of practical tasks involving reflection, there are scenarios when the poor performance of run-time approaches becomes a significant disadvantage. This problem manifests itself in Kotless, a popular framework for developing serverless applications, because the faster the applications launch, the less their cloud infrastructure costs. In this paper, we present Reflekt - a compile-time reflection library which allows to perform the search among classes, object expressions (which in Kotlin are implemented as singleton classes), and functions in Kotlin code based on the given search query. It comes with a convenient DSL and better performance comparing to the existing run-time reflection approaches. Our experiments show that replacing run-time reflection calls with Reflekt in serverless applications created with Kotless resulted in a significant performance boost in start-up time of these applications.

SEDec 6, 2021
Hyperstyle: A Tool for Assessing the Code Quality of Solutions to Programming Assignments

Anastasiia Birillo, Ilya Vlasov, Artyom Burylov et al.

In software engineering, it is not enough to simply write code that only works as intended, even if it is free from vulnerabilities and bugs. Every programming language has a style guide and a set of best practices defined by its community, which help practitioners to build solutions that have a clear structure and therefore are easy to read and maintain. To introduce assessment of code quality into the educational process, we developed a tool called Hyperstyle. To make it reflect the needs of the programming community and at the same time be easily extendable, we built it upon several existing professional linters and code checkers. Hyperstyle supports four programming languages (Python, Java, Kotlin, and Javascript) and can be used as a standalone tool or integrated into a MOOC platform. We have integrated the tool into two educational platforms, Stepik and JetBrains Academy, and it has been used to process about one million submissions every week since May 2021.

SEDec 9, 2020
TaskTracker-tool: a Toolkit for Tracking of Code Snapshots and Activity Data During Solution of Programming Tasks

Elena Lyulina, Anastasiia Birillo, Vladimir Kovalenko et al.

The process of writing code and use of features in an integrated development environment (IDE) is a fruitful source of data in computing education research. Existing studies use records of students' actions in the IDE, consecutive code snapshots, compilation events, and others, to gain deep insight into the process of student programming. In this paper, we present a set of tools for collecting and processing data of student activity during problem-solving. The first tool is a plugin for IntelliJ-based IDEs (PyCharm, IntelliJ IDEA, CLion). By capturing snapshots of code and IDE interaction data, it allows to analyze the process of writing code in different languages -- Python, Java, Kotlin, and C++. The second tool is designed for the post-processing of data collected by the plugin and is capable of basic analysis and visualization. To validate and showcase the toolkit, we present a dataset collected by our tools. It consists of records of activity and IDE interaction events during solution of programming tasks by 148 participants of different ages and levels of programming experience. We propose several directions for further exploration of the dataset.