48.2AIApr 22
HiPO: Hierarchical Preference Optimization for Adaptive Reasoning in LLMsDarsh Kachroo, Adriana Caraeni, Arjun Prasaath Anbazhagan et al.
Direct Preference Optimization (DPO) is an effective framework for aligning large language models with human preferences, but it struggles with complex reasoning tasks. DPO optimizes for the likelihood of generating preferred over dispreferred responses in their entirety and lacks the granularity to provide feedback on subsections of many-step solutions typical of reasoning tasks. Existing methods excel at either stable preference learning (e.g., DPO variants like KTO and RSO) or structured reasoning (e.g., ReMA's multi-agent RL framework, Tree of Thoughts), but fail to merge these complementary strengths. We propose HiPO (Hierarchical Preference Optimization), an extension of DPO that separates responses into reasoning segments (query clarification and context, reasoning steps, and answer) and computes loss as a weighted sum of the DPO loss for each segment. Our approach enables segment-specific training while maintaining DPO's computational efficiency and training stability. We demonstrate that for multiple 7B LLMs fine-tuned using HiPO and DPO on the Math Stack Exchange preference dataset, the models trained with HiPO outperform the others on a variety of common math benchmarks and achieve greater organization, logical flow, and consistency as measured by GPT-4.1.
1.5ETApr 17
Potential Energy Savings from Quantum Computing-Based Route OptimizationAyush Nadiger, Adriana Caraeni, Katie Schouten
We investigate the potential of the Quantum Approximate Optimization Algorithm (QAOA) for reducing energy consumption in route planning, a key challenge in logistics due to the NP-hard nature of the Traveling Salesman and Vehicle Routing Problems. By encoding route optimization as a Quadratic Unconstrained Binary Optimization (QUBO) problem and implementing QAOA circuits at depth p = 3-5 alongside classical baselines of Simulated Annealing (SA) and Genetic Algorithms (GA), we perform systematic benchmarks on Euclidean graphs of sizes N = 5, 10, and 20. Our results demonstrate that QAOA attains higher solution quality with approximation ratios of 0.953 (N = 5), 0.921 (N = 10), and 0.903 (N = 20), outperforming SA and GA by 2.7-4.4%. Wall-clock runtimes for QAOA are 2-3x faster than SA across all tested sizes, and energy consumption measurements reveal a three-order-of-magnitude reduction, remaining in the picojoule range versus nanojoules for classical methods. Translating these gains to real-world logistics suggests an 8.2% improvement in routing efficiency could save approximately 2.62 EJ of fuel annually in the U.S., avoiding nearly 1.94 x 10^8 tonnes of CO2 emissions. These findings highlight QAOA's promise as a fast, energy-efficient optimizer for sustainable logistics applications and underscore its potential role in next-generation fleet-management systems.
57.9HCApr 15
Cognitive Offloading in Agile Teams: How Artificial Intelligence Reshapes Risk Assessment and Planning QualityAdriana Caraeni, Alexander Shick, Andrew Lan
Recent advances in artificial intelligence (AI) have shown promise in automating key aspects of Agile project management, yet their impact on team cognition remains underexplored. In this work, we investigate cognitive offloading in Agile sprint planning by conducting a controlled, three-condition experiment comparing AI-only, human-only, and hybrid planning models on a live client deliverable at a mid-sized digital agency. Using quantitative metrics -- including estimation accuracy, rework rates, and scope change recovery time -- alongside qualitative indicators of planning robustness, we evaluate each model's effectiveness beyond raw efficiency. We find that while AI-only planning minimizes time and cost, it significantly degrades risk capture rates and increases rework due to unstated assumptions, whereas human-only planning excels at adaptability but incurs substantial overhead. Drawing on these findings, we propose a theoretical framework for hybrid AI-human sprint planning that assigns algorithmic tools to estimation and backlog formatting while mandating human deliberation for risk assessment and ambiguity resolution. Our results challenge the assumption that efficiency equates to effectiveness, offering actionable governance strategies for organizations seeking to augment rather than erode team cognition.
CYNov 7, 2024
Evaluating GPT-4 at Grading Handwritten Solutions in Math ExamsAdriana Caraeni, Alexander Scarlatos, Andrew Lan
Recent advances in generative artificial intelligence (AI) have shown promise in accurately grading open-ended student responses. However, few prior works have explored grading handwritten responses due to a lack of data and the challenge of combining visual and textual information. In this work, we leverage state-of-the-art multi-modal AI models, in particular GPT-4o, to automatically grade handwritten responses to college-level math exams. Using real student responses to questions in a probability theory exam, we evaluate GPT-4o's alignment with ground-truth scores from human graders using various prompting techniques. We find that while providing rubrics improves alignment, the model's overall accuracy is still too low for real-world settings, showing there is significant room for growth in this task.