AI CLMay 2, 2024

Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation

David Eric Austin, Anton Korikov, Armin Toroghi, Scott Sanner

arXiv:2405.00981v217.224 citationsh-index: 9RecSys

Originality Incremental advance

AI Analysis

This work addresses the problem of cold-start preference elicitation for personalized conversational recommendation systems, representing an incremental improvement by integrating existing techniques.

The paper tackles the challenge of efficiently eliciting user preferences in conversational recommendation systems by combining Bayesian optimization with large language models, resulting in a method that achieves an MRR@10 of 0.27 after 10 dialogue turns, outperforming a baseline LLM approach with 0.17.

Designing preference elicitation (PE) methodologies that can quickly ascertain a user's top item preferences in a cold-start setting is a key challenge for building effective and personalized conversational recommendation (ConvRec) systems. While large language models (LLMs) enable fully natural language (NL) PE dialogues, we hypothesize that monolithic LLM NL-PE approaches lack the multi-turn, decision-theoretic reasoning required to effectively balance the exploration and exploitation of user preferences towards an arbitrary item set. In contrast, traditional Bayesian optimization PE methods define theoretically optimal PE strategies, but cannot generate arbitrary NL queries or reason over content in NL item descriptions -- requiring users to express preferences via ratings or comparisons of unfamiliar items. To overcome the limitations of both approaches, we formulate NL-PE in a Bayesian Optimization (BO) framework that seeks to actively elicit NL feedback to identify the best recommendation. Key challenges in generalizing BO to deal with natural language feedback include determining: (a) how to leverage LLMs to model the likelihood of NL preference feedback as a function of item utilities, and (b) how to design an acquisition function for NL BO that can elicit preferences in the infinite space of language. We demonstrate our framework in a novel NL-PE algorithm, PEBOL, which uses: 1) Natural Language Inference (NLI) between user preference utterances and NL item descriptions to maintain Bayesian preference beliefs, and 2) BO strategies such as Thompson Sampling (TS) and Upper Confidence Bound (UCB) to steer LLM query generation. We numerically evaluate our methods in controlled simulations, finding that after 10 turns of dialogue, PEBOL can achieve an MRR@10 of up to 0.27 compared to the best monolithic LLM baseline's MRR@10 of 0.17, despite relying on earlier and smaller LLMs.

View on arXiv PDF

Similar