Tirth Bhatt

AI
h-index7
3papers
1citation
Novelty32%
AI Score37

3 Papers

33.3CYApr 1
Democratizing Foundations of Problem-Solving with AI: A Breadth-First Search Curriculum for Middle School Students

Griffin Pitts, Kimia Fazeli, Tirth Bhatt et al.

As AI becomes more common in students' everyday experiences, a major challenge for K-12 AI education is designing learning experiences that can be meaningfully integrated into existing subject-area instruction. This paper presents the design and implementation of an AI4K12-aligned curriculum that embeds AI learning goals within a rural middle school science classroom using Breadth-First Search (BFS) as an accessible entry point to AI problem-solving. Through unplugged activities and an interactive simulation environment, students learned BFS as a strategy for exploring networks and identifying shortest paths, then applied it to science contexts involving virus spread and contact tracing. To examine engagement and learning, we analyzed pre- and post-assessments, student work artifacts, and a teacher interview. Results suggest that students engaged productively with the curriculum, improved their understanding of BFS and AI problem-solving, and benefited from learning these ideas within ongoing science instruction. Teacher feedback further indicated that the module fit well within the science curriculum while supporting intended science learning outcomes. We conclude with curriculum and design considerations for broadening access to learning about problem-solving with AI in education.

67.0AIMay 13
Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Education

Mragisha Jain, Tirth Bhatt, Griffin Pitts et al.

Students learning algorithms often need support as they interpret traces, debug reasoning errors, and apply procedures across unfamiliar problem instances. In this paper, we present KITE (Knowledge-Informed Tutoring Engine), a Retrieval-Augmented Generation (RAG)-based intelligent tutoring system designed to serve as a classroom teaching assistant for algorithmic reasoning and problem-solving tasks. KITE uses an intent-aware Socratic response strategy to tailor support to different student needs, responding with targeted hints, guiding questions, and progressive scaffolding intended to strengthen students' algorithmic problem-solving ability. To keep responses aligned with course content, KITE uses a multimodal RAG pipeline that retrieves relevant information from course materials. We evaluate KITE using three forms of assessment: RAGAs-based metrics for response grounding and quality, expert evaluation of pedagogical quality, and a simulated student pipeline in which a weaker language model interacts with KITE across two-turn dialogues and produces revised answers after receiving feedback. Results indicate that KITE produces contextually grounded and pedagogically appropriate responses. Further, using simulated students, KITE's feedback helped the student models produce more accurate follow-up responses on procedural and tracing questions, suggesting that its scaffolding can support algorithmic problem-solving. This work contributes a tutoring architecture and an evaluation approach for assessing retrieval-grounded explanations and scaffolded problem-solving feedback.

SEOct 7, 2025
Automated Program Repair of Uncompilable Student Code

Griffin Pitts, Aum Pandya, Darsh Rank et al.

A significant portion of student programming submissions in CS1 learning environments are uncompilable, limiting their use in student modeling and downstream knowledge tracing. Traditional modeling pipelines often exclude these cases, discarding observations of student learning. This study investigates automated program repair as a strategy to recover uncompilable code while preserving students' structural intent for use in student modeling. Within this framework, we assess large language models (LLMs) as repair agents, including GPT-5 (OpenAI), Claude 3.5 Haiku (Anthropic), and Gemini 2.5 Flash (Google), under high- and low-context prompting conditions. Repairs were evaluated for compilability, edit distance, and preservation of students' original structure and logic. We find that while all three LLMs are capable of producing compilable repairs, their behavior diverges in how well they preserve students' control flow and code structure, which affects their pedagogical utility. By recovering uncompilable submissions, this work enables richer and more comprehensive analyses of learners' coding processes and development over time.