The Extractive-Abstractive Axis: Measuring Content "Borrowing" in Generative Language Models
This work addresses content attribution issues for users of generative models, but it is incremental as it focuses on proposing a framework rather than implementing solutions.
The paper tackles the problem of measuring content 'borrowing' in generative language models by proposing an Extractive-Abstractive axis for benchmarking, highlighting the need for metrics, datasets, and guidelines to address implications for content licensing and attribution.
Generative language models produce highly abstractive outputs by design, in contrast to extractive responses in search engines. Given this characteristic of LLMs and the resulting implications for content Licensing & Attribution, we propose the the so-called Extractive-Abstractive axis for benchmarking generative models and highlight the need for developing corresponding metrics, datasets and annotation guidelines. We limit our discussion to the text modality.