André Martin

LG
3papers
87citations
Novelty38%
AI Score22

3 Papers

LGJan 28, 2023
Heterogeneous Datasets for Federated Survival Analysis Simulation

Alberto Archetti, Eugenio Lomurno, Francesco Lattari et al.

Survival analysis studies time-modeling techniques for an event of interest occurring for a population. Survival analysis found widespread applications in healthcare, engineering, and social sciences. However, the data needed to train survival models are often distributed, incomplete, censored, and confidential. In this context, federated learning can be exploited to tremendously improve the quality of the models trained on distributed data while preserving user privacy. However, federated survival analysis is still in its early development, and there is no common benchmarking dataset to test federated survival models. This work provides a novel technique for constructing realistic heterogeneous datasets by starting from existing non-federated datasets in a reproducible way. Specifically, we propose two dataset-splitting algorithms based on the Dirichlet distribution to assign each data sample to a carefully chosen client: quantity-skewed splitting and label-skewed splitting. Furthermore, these algorithms allow for obtaining different levels of heterogeneity by changing a single hyperparameter. Finally, numerical experiments provide a quantitative evaluation of the heterogeneity level using log-rank tests and a qualitative analysis of the generated splits. The implementation of the proposed methods is publicly available in favor of reproducibility and to encourage common practices to simulate federated environments for survival analysis.

CRMar 31, 2020
Trust Management as a Service: Enabling Trusted Execution in the Face of Byzantine Stakeholders

Franz Gregor, Wojciech Ozga, Sébastien Vaucher et al.

Trust is arguably the most important challenge for critical services both deployed as well as accessed remotely over the network. These systems are exposed to a wide diversity of threats, ranging from bugs to exploits, active attacks, rogue operators, or simply careless administrators. To protect such applications, one needs to guarantee that they are properly configured and securely provisioned with the "secrets" (e.g., encryption keys) necessary to preserve not only the confidentiality, integrity and freshness of their data but also their code. Furthermore, these secrets should not be kept under the control of a single stakeholder - which might be compromised and would represent a single point of failure - and they must be protected across software versions in the sense that attackers cannot get access to them via malicious updates. Traditional approaches for solving these challenges often use ad hoc techniques and ultimately rely on a hardware security module (HSM) as root of trust. We propose a more powerful and generic approach to trust management that instead relies on trusted execution environments (TEEs) and a set of stakeholders as root of trust. Our system, PALAEMON, can operate as a managed service deployed in an untrusted environment, i.e., one can delegate its operations to an untrusted cloud provider with the guarantee that data will remain confidential despite not trusting any individual human (even with root access) nor system software. PALAEMON addresses in a secure, efficient and cost-effective way five main challenges faced when developing trusted networked applications and services. Our evaluation on a range of benchmarks and real applications shows that PALAEMON performs efficiently and can protect secrets of services without any change to their source code.

LGOct 12, 2018
Grand Challenge: Real-time Destination and ETA Prediction for Maritime Traffic

Oleh Bodunov, Florian Schmidt, André Martin et al.

In this paper, we present our approach for solving the DEBS Grand Challenge 2018. The challenge asks to provide a prediction for (i) a destination and the (ii) arrival time of ships in a streaming-fashion using Geo-spatial data in the maritime context. Novel aspects of our approach include the use of ensemble learning based on Random Forest, Gradient Boosting Decision Trees (GBDT), XGBoost Trees and Extremely Randomized Trees (ERT) in order to provide a prediction for a destination while for the arrival time, we propose the use of Feed-forward Neural Networks. In our evaluation, we were able to achieve an accuracy of 97% for the port destination classification problem and 90% (in mins) for the ETA prediction.