Daniela Paolotti

CY
4papers
27citations
Novelty50%
AI Score40

4 Papers

57.4CYMay 29
Context-Conditioned Generative Models Enable Subnational Refinement of Sparse Humanitarian Surveys

Federica Sibilla, Vasiliki Voukelatou, Duccio Piovani et al.

Data scarcity limits inference in many scientific and policy domains. Survey data are essential for decision-making, but sparse samples often fail to capture fine spatial granularities. We evaluate normalizing flows, a generative model that learns complex data distributions and can be conditioned on exogenous contextual features, in controlled data scarcity scenarios. Across eight household survey datasets spanning six low-income or middle-income countries in the humanitarian domain, we show that context-conditioned generative models can refine sub-national survey distributions under severe data scarcity, and that performance increases systematically with the richness of the conditioning information. These findings support a general principle for survey data augmentation: generative models can improve sub-national estimates when the sparse sample retains sufficient support and contextual covariates encode relevant local heterogeneity. By learning full conditional distributions rather than point estimates, the approach provides fine-grained evidence for humanitarian decision-making and resource allocation.

CLApr 12, 2021
Developing Annotated Resources for Internal Displacement Monitoring

Fabio Poletto, Yunbai Zhang, Andre Panisson et al.

This paper describes in details the design and development of a novel annotation framework and of annotated resources for Internal Displacement, as the outcome of a collaboration with the Internal Displacement Monitoring Centre, aimed at improving the accuracy of their monitoring platform IDETECT. The schema includes multi-faceted description of the events, including cause, quantity of people displaced, location and date. Higher-order facets aimed at improving the information extraction, such as document relevance and type, are proposed. We also report a case study of machine learning application to the document classification tasks. Finally, we discuss the importance of standardized schema in dataset benchmark development and its impact on the development of reliable disaster monitoring infrastructure.

SINov 16, 2020
Link prediction in multiplex networks via triadic closure

Alberto Aleta, Marta Tuninetti, Daniela Paolotti et al.

Link prediction algorithms can help to understand the structure and dynamics of complex systems, to reconstruct networks from incomplete data sets and to forecast future interactions in evolving networks. Available algorithms based on similarity between nodes are bounded by the limited amount of links present in these networks. In this work, we reduce this latter intrinsic limitation and show that different kind of relational data can be exploited to improve the prediction of new links. To this aim, we propose a novel link prediction algorithm by generalizing the Adamic-Adar method to multiplex networks composed by an arbitrary number of layers, that encode diverse forms of interactions. We show that the new metric outperforms the classical single-layered Adamic-Adar score and other state-of-the-art methods, across several social, biological and technological systems. As a byproduct, the coefficients that maximize the Multiplex Adamic-Adar metric indicate how the information structured in a multiplex network can be optimized for the link prediction task, revealing which layers are redundant. Interestingly, this effect can be asymmetric with respect to predictions in different layers. Our work paves the way for a deeper understanding of the role of different relational data in predicting new interactions and provides a new algorithm for link prediction in multiplex networks that can be applied to a plethora of systems.

CYSep 2, 2019
Learning Real Estate Automated Valuation Models from Heterogeneous Data Sources

Francesco Bergadano, Roberto Bertilone, Daniela Paolotti et al.

Real estate appraisal is a complex and important task, that can be made more precise and faster with the help of automated valuation tools. Usually the value of some property is determined by taking into account both structural and geographical characteristics. However, while geographical information is easily found, obtaining significant structural information requires the intervention of a real estate expert, a professional appraiser. In this paper we propose a Web data acquisition methodology, and a Machine Learning model, that can be used to automatically evaluate real estate properties. This method uses data from previous appraisal documents, from the advertised prices of similar properties found via Web crawling, and from open data describing the characteristics of a corresponding geographical area. We describe a case study, applicable to the whole Italian territory, and initially trained on a data set of individual homes located in the city of Turin, and analyze prediction and practical applicability.