TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation
This work addresses the need for synthetic datasets to train multimodal conversational recommendation systems, but it is incremental as it builds on prior synthetic data generation methods.
The authors tackled the problem of generating synthetic data for multimodal conversational music recommendation by developing TalkPlayData 2, an agentic pipeline using LLM agents with multimodal capabilities, and achieved its goal in evaluations for training generative recommendation models.
We present TalkPlayData 2, a synthetic dataset for multimodal conversational music recommendation generated by an agentic data pipeline. In the proposed pipeline, multiple large language model (LLM) agents are created under various roles with specialized prompts and access to different parts of information, and the chat data is acquired by logging the conversation between the Listener LLM and the Recsys LLM. To cover various conversation scenarios, for each conversation, the Listener LLM is conditioned on a finetuned conversation goal. Finally, all the LLMs are multimodal with audio and images, allowing a simulation of multimodal recommendation and conversation. In the LLM-as-a-judge and subjective evaluation experiments, TalkPlayData 2 achieved the proposed goal in various aspects related to training a generative recommendation model for music. TalkPlayData 2 and its generation code are released at https://talkpl.ai/talkplaydata2.