David M. Pereira

2papers

2 Papers

23.4CYApr 6
Who is the author? A legal and normative view of authorship in Generative AI-aided academic works

David M. Pereira

The widespread adoption of generative artificial intelligence (GenAI) tools in higher education has fundamentally altered the conditions under which academic work is produced, challenging long-standing assumptions about authorship, responsibility, and learning. While much of the existing literature has focused on technical, ethical, or pedagogical implications of GenAI, comparatively little attention has been paid to the legal and normative aspects of authorship in AI-aided academic work. In this work, we examine how the use of GenAI intersects with the concept of authorship as understood within European regulatory and institutional frameworks. Drawing primarily on European copyright law, notably the requirement of human intellectual creation, the paper argues that authorship functions as a qualitative threshold rather than a binary attribute. Authorship may remain attributable to the student where GenAI operates as cognitive support under human intellectual control. By contrast, attribution becomes legally and normatively disputable once AI output displaces creative autonomy. The analysis places this doctrinal framework alongside broader regulatory principles arising from the AI Act, data protection law, and emerging suprainstitutional governance practices in higher education. We propose a qualitative threshold framework designed to assist in authorship-sensitive assessment of GenAI-aided academic work. This framework provides criteria for distinguishing legitimate AI-assisted academic production from practices that undermine authorship, responsibility, and academic integrity.

LGOct 27, 2021
The chemical space of terpenes: insights from data science and AI

Morteza Hosseini, David M. Pereira

Terpenes are a widespread class of natural products with significant chemical and biological diversity and many of these molecules have already made their way into medicines. Given the thousands of molecules already described, the full characterization of this chemical space can be a challenging task when relying in classical approaches. In this work we employ a data science-based approach to identify, compile and characterize the diversity of terpenes currently known in a systematic way. We worked with a natural product database, COCONUT, from which we extracted information for nearly 60000 terpenes. For these molecules, we conducted a subclass-by-subclass analysis in which we highlight several chemical and physical properties relevant to several fields, such as natural products chemistry, medicinal chemistry and drug discovery, among others. We were also interested in assessing the potential of this data for clustering and classification tasks. For clustering, we have applied and compared k-means with agglomerative clustering, both to the original data and following a step of dimensionality reduction. To this end, PCA, FastICA, Kernel PCA, t-SNE and UMAP were used and benchmarked. We also employed a number of methods for the purpose of classifying terpene subclasses using their physico-chemical descriptors. Light gradient boosting machine, k-nearest neighbors, random forests, Gaussian naiive Bayes and Multilayer perceptron, with the best-performing algorithms yielding accuracy, F1 score, precision and other metrics all over 0.9, thus showing the capabilities of these approaches for the classification of terpene subclasses.