LG MAFeb 27

Dynamics of Learning under User Choice: Overspecialization and Peer-Model Probing

Adhyyan Narang, Sarah Dean, Lillian J Ratliff, Maryam Fazel

arXiv:2602.23565v11.4h-index: 28

Originality Incremental advance

AI Analysis

This addresses a critical issue for platforms in competitive markets where user choice leads to data fragmentation, though it is incremental as it builds on prior work in knowledge distillation and feedback mechanisms.

The paper tackles the problem of machine learning models overspecializing and performing poorly globally when users choose among multiple platforms, showing that existing algorithms can converge to arbitrarily bad global performance. The authors propose a peer-model probing algorithm that converges to bounded full-population risk under certain conditions, verified with experiments on datasets like MovieLens and Census.

In many economically relevant contexts where machine learning is deployed, multiple platforms obtain data from the same pool of users, each of whom selects the platform that best serves them. Prior work in this setting focuses exclusively on the "local" losses of learners on the distribution of data that they observe. We find that there exist instances where learners who use existing algorithms almost surely converge to models with arbitrarily poor global performance, even when models with low full-population loss exist. This happens through a feedback-induced mechanism, which we call the overspecialization trap: as learners optimize for users who already prefer them, they become less attractive to users outside this base, which further restricts the data they observe. Inspired by the recent use of knowledge distillation in modern ML, we propose an algorithm that allows learners to "probe" the predictions of peer models, enabling them to learn about users who do not select them. Our analysis characterizes when probing succeeds: this procedure converges almost surely to a stationary point with bounded full-population risk when probing sources are sufficiently informative, e.g., a known market leader or a majority of peers with good global performance. We verify our findings with semi-synthetic experiments on the MovieLens, Census, and Amazon Sentiment datasets.

View on arXiv PDF

Similar