János Kertész

LG
h-index26
7papers
196citations
Novelty38%
AI Score25

7 Papers

LGFeb 17, 2023
Interpreting wealth distribution via poverty map inference using multimodal data

Lisette Espín-Noboa, János Kertész, Márton Karsai

Poverty maps are essential tools for governments and NGOs to track socioeconomic changes and adequately allocate infrastructure and services in places in need. Sensor and online crowd-sourced data combined with machine learning methods have provided a recent breakthrough in poverty map inference. However, these methods do not capture local wealth fluctuations, and are not optimized to produce accountable results that guarantee accurate predictions to all sub-populations. Here, we propose a pipeline of machine learning models to infer the mean and standard deviation of wealth across multiple geographically clustered populated places, and illustrate their performance in Sierra Leone and Uganda. These models leverage seven independent and freely available feature sources based on satellite images, and metadata collected via online crowd-sourcing and social media. Our models show that combined metadata features are the best predictors of wealth in rural areas, outperforming image-based models, which are the best for predicting the highest wealth quintiles. Our results recover the local mean and variation of wealth, and correctly capture the positive yet non-monotonous correlation between them. We further demonstrate the capabilities and limitations of model transfer across countries and the effects of data recency and other biases. Our methodology provides open tools to build towards more transparent and interpretable models to help governments and NGOs to make informed decisions based on data availability, urbanization level, and poverty thresholds.

SOC-PHAug 3, 2024
A Comparative Analysis of Wealth Index Predictions in Africa between three Multi-Source Inference Models

Márton Karsai, János Kertész, Lisette Espín-Noboa

Poverty map inference has become a critical focus of research, utilizing both traditional and modern techniques, ranging from regression models to convolutional neural networks applied to tabular data, satellite imagery, and networks. While much attention has been given to validating models during the training phase, the final predictions have received less scrutiny. In this study, we analyze the International Wealth Index (IWI) predicted by Lee and Braithwaite (2022) and Espín-Noboa et al. (2023), alongside the Relative Wealth Index (RWI) inferred by Chi et al. (2022), across six Sub-Saharan African countries. Our analysis reveals trends and discrepancies in wealth predictions between these models. In particular, significant and unexpected discrepancies between the predictions of Lee and Braithwaite and Espín-Noboa et al., even after accounting for differences in training data. In contrast, the shape of the wealth distributions predicted by Espín-Noboa et al. and Chi et al. are more closely aligned, suggesting similar levels of skewness. These findings raise concerns about the validity of certain models and emphasize the importance of rigorous audits for wealth prediction algorithms used in policy-making. Continuous validation and refinement are essential to ensure the reliability of these models, particularly when they inform poverty alleviation strategies.

LGDec 7, 2023
Coordination-free Decentralised Federated Learning on Complex Networks: Overcoming Heterogeneity

Lorenzo Valerio, Chiara Boldrini, Andrea Passarella et al.

Federated Learning (FL) is a well-known framework for successfully performing a learning task in an edge computing scenario where the devices involved have limited resources and incomplete data representation. The basic assumption of FL is that the devices communicate directly or indirectly with a parameter server that centrally coordinates the whole process, overcoming several challenges associated with it. However, in highly pervasive edge scenarios, the presence of a central controller that oversees the process cannot always be guaranteed, and the interactions (i.e., the connectivity graph) between devices might not be predetermined, resulting in a complex network structure. Moreover, the heterogeneity of data and devices further complicates the learning process. This poses new challenges from a learning standpoint that we address by proposing a communication-efficient Decentralised Federated Learning (DFL) algorithm able to cope with them. Our solution allows devices communicating only with their direct neighbours to train an accurate model, overcoming the heterogeneity induced by data and different training histories. Our results show that the resulting local models generalise better than those trained with competing approaches, and do so in a more communication-efficient way.

LGMar 23, 2024
Initialisation and Network Effects in Decentralised Federated Learning

Arash Badie-Modiri, Chiara Boldrini, Lorenzo Valerio et al.

Fully decentralised federated learning enables collaborative training of individual machine learning models on a distributed network of communicating devices while keeping the training data localised on each node. This approach avoids central coordination, enhances data privacy and eliminates the risk of a single point of failure. Our research highlights that the effectiveness of decentralised federated learning is significantly influenced by the network topology of connected devices and the learning models' initial conditions. We propose a strategy for uncoordinated initialisation of the artificial neural networks based on the distribution of eigenvector centralities of the underlying communication network, leading to a radically improved training efficiency. Additionally, our study explores the scaling behaviour and the choice of environmental parameters under our proposed initialisation strategy. This work paves the way for more efficient and scalable artificial neural network training in a distributed and uncoordinated environment, offering a deeper understanding of the intertwining roles of network structure and learning dynamics.

LGMay 3, 2024
Robustness of Decentralised Learning to Nodes and Data Disruption

Luigi Palmieri, Chiara Boldrini, Lorenzo Valerio et al.

In the vibrant landscape of AI research, decentralised learning is gaining momentum. Decentralised learning allows individual nodes to keep data locally where they are generated and to share knowledge extracted from local data among themselves through an interactive process of collaborative refinement. This paradigm supports scenarios where data cannot leave local nodes due to privacy or sovereignty reasons or real-time constraints imposing proximity of models to locations where inference has to be carried out. The distributed nature of decentralised learning implies significant new research challenges with respect to centralised learning. Among them, in this paper, we focus on robustness issues. Specifically, we study the effect of nodes' disruption on the collective learning process. Assuming a given percentage of "central" nodes disappear from the network, we focus on different cases, characterised by (i) different distributions of data across nodes and (ii) different times when disruption occurs with respect to the start of the collaborative learning task. Through these configurations, we are able to show the non-trivial interplay between the properties of the network connecting nodes, the persistence of knowledge acquired collectively before disruption or lack thereof, and the effect of data availability pre- and post-disruption. Our results show that decentralised learning processes are remarkably robust to network disruption. As long as even minimum amounts of data remain available somewhere in the network, the learning process is able to recover from disruptions and achieve significant classification accuracy. This clearly varies depending on the remaining connectivity after disruption, but we show that even nodes that remain completely isolated can retain significant knowledge acquired before the disruption.

SOC-PHMay 23, 2013
The most controversial topics in Wikipedia: A multilingual and geographical analysis

Taha Yasseri, Anselm Spoerri, Mark Graham et al.

We present, visualize and analyse the similarities and differences between the controversial topics related to "edit wars" identified in 10 different language versions of Wikipedia. After a brief review of the related work we describe the methods developed to locate, measure, and categorize the controversial topics in the different languages. Visualizations of the degree of overlap between the top 100 lists of most controversial articles in different languages and the content related to geographical locations will be presented. We discuss what the presented analysis and visualizations can tell us about the multicultural aspects of Wikipedia and practices of peer-production. Our results indicate that Wikipedia is more than just an encyclopaedia; it is also a window into convergent and divergent social-spatial priorities, interests and preferences.

CLApr 12, 2012
A practical approach to language complexity: a Wikipedia case study

Taha Yasseri, András Kornai, János Kertész

In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of words and part of speech tags) shows that the language of Simple is less complex than that of Main primarily due to the use of shorter sentences, as opposed to drastically simplified syntax or vocabulary. Comparing the two language varieties by the Gunning readability index supports this conclusion. We also report on the topical dependence of language complexity, e.g. that the language is more advanced in conceptual articles compared to person-based (biographical) and object-based articles. Finally, we investigate the relation between conflict and language complexity by analyzing the content of the talk pages associated to controversial and peacefully developing articles, concluding that controversy has the effect of reducing language complexity.