Thanassis Tiropanis

DB
3papers
19citations
Novelty32%
AI Score38

3 Papers

LGOct 22, 2022
Explanation Shift: Detecting distribution shifts on tabular data via the explanation space

Carlos Mougan, Klaus Broelemann, Gjergji Kasneci et al.

As input data distributions evolve, the predictive performance of machine learning models tends to deteriorate. In the past, predictive performance was considered the key indicator to monitor. However, explanation aspects have come to attention within the last years. In this work, we investigate how model predictive performance and model explanation characteristics are affected under distribution shifts and how these key indicators are related to each other for tabular data. We find that the modeling of explanation shifts can be a better indicator for the detection of predictive performance changes than state-of-the-art techniques based on representations of distribution shifts. We provide a mathematical analysis of different types of distribution shifts as well as synthetic experimental examples.

8.3DBApr 23
Implementation and Privacy Guarantees for Scalable Keyword Search on SOLID-based Decentralized Data with Granular Visibility Constraints

Mohamed Ragab, Faria Ferooz, Mohammad Bahrani et al.

In decentralized personal data ecosystems grounded in architectures such as Solid, users retain sovereignty over their data via personal online data stores (pods), hosted on Solid-compliant server infrastructures. In such environments, data remains under the control of pod owners, which complicates search due to distribution across numerous pods and user-specific access constraints. ESPRESSO is a decentralized framework for scalable keyword-based search across distributed Solid pods under user-defined visibility policies. It addresses key challenges of decentralized search by constructing WebID-scoped indexes within pods and employing privacy-aware metadata to enable efficient source selection and ranking across servers. This paper further introduces a formal threat model for ESPRESSO, analysing the security and privacy risks associated with the generation, aggregation, and use of indexes and metadata. These risks include unintended metadata leakage and the potential for adversaries to infer sensitive information about data that resides within personal data stores. The analysis identifies key design principles that limit metadata exposure while mitigating unauthorized inference. The proposed threat model provides a foundation for evaluating privacy-preserving decentralized search and informs the design of systems with stronger privacy guarantees.

SEAug 16, 2019Code
FAIR and Open Computer Science Research Software

Wilhelm Hasselbring, Leslie Carr, Simon Hettrick et al.

In computational science and in computer science, research software is a central asset for research. Computational science is the application of computer science and software engineering principles to solving scientific problems, whereas computer science is the study of computer hardware and software design. The Open Science agenda holds that science advances faster when we can build on existing results. Therefore, research software has to be reusable for advancing science. Thus, we need proper research software engineering for obtaining reusable and sustainable research software. This way, software engineering methods may improve research in other disciplines. However, research in software engineering and computer science itself will also benefit from reuse when research software is involved. For good scientific practice, the resulting research software should be open and adhere to the FAIR principles (findable, accessible, interoperable and repeatable) to allow repeatability, reproducibility, and reuse. Compared to research data, research software should be both archived for reproducibility and actively maintained for reusability. The FAIR data principles do not require openness, but research software should be open source software. Established open source software licenses provide sufficient licensing options, such that it should be the rare exception to keep research software closed. We review and analyze the current state in this area in order to give recommendations for making computer science research software FAIR and open. We observe that research software publishing practices in computer science and in computational science show significant differences.