CLMar 1Code
MOSAIC: Modular Opinion Summarization using Aspect Identification and ClusteringPiyush Kumar Singh, Jayesh Choudhari
Reviews are central to how travelers evaluate products on online marketplaces, yet existing summarization research often emphasizes end-to-end quality while overlooking benchmark reliability and the practical utility of granular insights. To address this, we propose MOSAIC, a scalable, modular framework designed for industrial deployment that decomposes summarization into interpretable components, including theme discovery, structured opinion extraction, and grounded summary generation. We validate the practical impact of our approach through online A/B tests on live product pages, showing that surfacing intermediate outputs improves customer experience and delivers measurable value even prior to full summarization deployment. We further conduct extensive offline experiments to demonstrate that MOSAIC achieves superior aspect coverage and faithfulness compared to strong baselines for summarization. Crucially, we introduce opinion clustering as a system-level component and show that it significantly enhances faithfulness, particularly under the noisy and redundant conditions typical of user reviews. Finally, we identify reliability limitations in the standard SPACE dataset and release a new open-source tour experience dataset (TRECS) to enable more robust evaluation.
LGJul 18, 2025
Prompt Smart, Pay Less: Cost-Aware APO for Real-World ApplicationsJayesh Choudhari, Piyush Kumar Singh, Douglas McIlwraith et al.
Prompt design is a critical factor in the effectiveness of Large Language Models (LLMs), yet remains largely heuristic, manual, and difficult to scale. This paper presents the first comprehensive evaluation of Automatic Prompt Optimization (APO) methods for real-world, high-stakes multiclass classification in a commercial setting, addressing a critical gap in the existing literature where most of the APO frameworks have been validated only on benchmark classification tasks of limited complexity. We introduce APE-OPRO, a novel hybrid framework that combines the complementary strengths of APE and OPRO, achieving notably better cost-efficiency, around $18\%$ improvement over OPRO, without sacrificing performance. We benchmark APE-OPRO alongside both gradient-free (APE, OPRO) and gradient-based (ProTeGi) methods on a dataset of ~2,500 labeled products. Our results highlight key trade-offs: ProTeGi offers the strongest absolute performance at lower API cost but higher computational time as noted in~\cite{protegi}, while APE-OPRO strikes a compelling balance between performance, API efficiency, and scalability. We further conduct ablation studies on depth and breadth hyperparameters, and reveal notable sensitivity to label formatting, indicating implicit sensitivity in LLM behavior. These findings provide actionable insights for implementing APO in commercial applications and establish a foundation for future research in multi-label, vision, and multimodal prompt optimization scenarios.