Who Owns the Output? Bridging Law and Technology in LLMs Attribution
This addresses intellectual property and ethical concerns for users and creators of AI-generated content, but it is incremental as it reviews existing tools and proposes a framework without introducing novel technical solutions.
The paper tackles the problem of attributing AI-generated content from LLMs and LMMs, which is challenging due to lack of systematic fingerprinting and complex training data, and proposes a review of legislative and technological instruments along with a legal framework to ensure accountability, though current techniques have strong limitations requiring new attribution methods.
Since the introduction of ChatGPT in 2022, Large language models (LLMs) and Large Multimodal Models (LMM) have transformed content creation, enabling the generation of human-quality content, spanning every medium, text, images, videos, and audio. The chances offered by generative AI models are endless and are drastically reducing the time required to generate content and usually raising the quality of the generation. However, considering the complexity and the difficult traceability of the generated content, the use of these tools provides challenges in attributing AI-generated content. The difficult attribution resides for a variety of reasons, starting from the lack of a systematic fingerprinting of the generated content and ending with the enormous amount of data on which LLMs and LMM are trained, which makes it difficult to connect generated content to the training data. This scenario is raising concerns about intellectual property and ethical responsibilities. To address these concerns, in this paper, we bridge the technological, ethical, and legislative aspects, by proposing a review of the legislative and technological instruments today available and proposing a legal framework to ensure accountability. In the end, we propose three use cases of how these can be combined to guarantee that attribution is respected. However, even though the techniques available today can guarantee a greater attribution to a greater extent, strong limitations still apply, that can be solved uniquely by the development of new attribution techniques, to be applied to LLMs and LMMs.