SYMay 17, 2018
A Decomposition-based Approach towards the Control of Boolean Networks (Technical Report)Soumya Paul, Cui Su, Jun Pang et al.
We study the problem of computing a minimal subset of nodes of a given asynchronous Boolean network that need to be controlled to drive its dynamics from an initial steady state (or attractor) to a target steady state. Due to the phenomenon of state-space explosion, a simple global approach that performs computations on the entire network, may not scale well for large networks. We believe that efficient algorithms for such networks must exploit the structure of the networks together with their dynamics. Taking such an approach, we derive a decomposition-based solution to the minimal control problem which can be significantly faster than the existing approaches on large networks. We apply our solution to both real-life biological networks and randomly generated networks, demonstrating promising results.
CRJan 24, 2023
Membership Inference of Diffusion ModelsHailong Hu, Jun Pang
Recent years have witnessed the tremendous success of diffusion models in data synthesis. However, when diffusion models are applied to sensitive data, they also give rise to severe privacy concerns. In this paper, we systematically present the first study about membership inference attacks against diffusion models, which aims to infer whether a sample was used to train the model. Two attack methods are proposed, namely loss-based and likelihood-based attacks. Our attack methods are evaluated on several state-of-the-art diffusion models, over different datasets in relation to privacy-sensitive data. Extensive experimental evaluations show that our attacks can achieve remarkable performance. Furthermore, we exhaustively investigate various factors which can affect attack performance. Finally, we also evaluate the performance of our attack methods on diffusion models trained with differential privacy.
SYDec 5, 2020
Online Observability of Boolean Control NetworksGuisen Wu, Liyun Dai, Zhiming Liu et al.
Observabililty is an important topic of Boolean control networks (BCNs). In this paper, we propose a new type of observability named online observability to present the sufficient and necessary condition of determining the initial states of BCNs, when their initial states cannot be reset. And we design an algorithm to decide whether a BCN has the online observability. Moreover, we prove that a BCN is identifiable iff it satisfies controllability and the online observability, which reveals the essence of identification problem of BCNs.
LGDec 17, 2025
Bits for Privacy: Evaluating Post-Training Quantization via Membership InferenceChenxiang Zhang, Tongxi Qu, Zhong Li et al.
Deep neural networks are widely deployed with quantization techniques to reduce memory and computational costs by lowering the numerical precision of their parameters. While quantization alters model parameters and their outputs, existing privacy analyses primarily focus on full-precision models, leaving a gap in understanding how bit-width reduction can affect privacy leakage. We present the first systematic study of the privacy-utility relationship in post-training quantization (PTQ), a versatile family of methods that can be applied to pretrained models without further training. Using membership inference attacks as our evaluation framework, we analyze three popular PTQ algorithms-AdaRound, BRECQ, and OBC-across multiple precision levels (4-bit, 2-bit, and 1.58-bit) on CIFAR-10, CIFAR-100, and TinyImageNet datasets. Our findings consistently show that low-precision PTQs can reduce privacy leakage. In particular, lower-precision models demonstrate up to an order of magnitude reduction in membership inference vulnerability compared to their full-precision counterparts, albeit at the cost of decreased utility. Additional ablation studies on the 1.58-bit quantization level show that quantizing only the last layer at higher precision enables fine-grained control over the privacy-utility trade-off. These results offer actionable insights for practitioners to balance efficiency, utility, and privacy protection in real-world deployments.
CLJun 27, 2022
A Multilingual Dataset of COVID-19 Vaccination Attitudes on TwitterNinghan Chen, Xihui Chen, Jun Pang
Vaccine hesitancy is considered as one main cause of the stagnant uptake ratio of COVID-19 vaccines in Europe and the US where vaccines are sufficiently supplied. Fast and accurate grasp of public attitudes toward vaccination is critical to address vaccine hesitancy, and social media platforms have proved to be an effective source of public opinions. In this paper, we describe the collection and release of a dataset of tweets related to COVID-19 vaccines. This dataset consists of the IDs of 2,198,090 tweets collected from Western Europe, 17,934 of which are annotated with the originators' vaccination stances. Our annotation will facilitate using and developing data-driven models to extract vaccination attitudes from social media posts and thus further confirm the power of social media in public health surveillance. To lay the groundwork for future research, we not only perform statistical analysis and visualisation of our dataset, but also evaluate and compare the performance of established text-based benchmarks in vaccination stance extraction. We demonstrate one potential use of our data in practice in tracking the temporal changes of public COVID-19 vaccination attitudes.
SIMar 21, 2022
Unsupervised Network Embedding Beyond HomophilyZhiqiang Zhong, Guadalupe Gonzalez, Daniele Grattarola et al.
Network embedding (NE) approaches have emerged as a predominant technique to represent complex networks and have benefited numerous tasks. However, most NE approaches rely on a homophily assumption to learn embeddings with the guidance of supervisory signals, leaving the unsupervised heterophilous scenario relatively unexplored. This problem becomes especially relevant in fields where a scarcity of labels exists. Here, we formulate the unsupervised NE task as an r-ego network discrimination problem and develop the SELENE framework for learning on networks with homophily and heterophily. Specifically, we design a dual-channel feature embedding pipeline to discriminate r-ego networks using node attributes and structural information separately. We employ heterophily adapted self-supervised learning objective functions to optimise the framework to learn intrinsic node embeddings. We show that SELENE's components improve the quality of node embeddings, facilitating the discrimination of connected heterophilous nodes. Comprehensive empirical evaluations on both synthetic and real-world datasets with varying homophily ratios validate the effectiveness of SELENE in homophilous and heterophilous settings showing an up to 12.52% clustering accuracy gain.
LGMay 19, 2022
Simplifying Node Classification on Heterophilous Graphs with Compatible Label PropagationZhiqiang Zhong, Sergey Ivanov, Jun Pang
Graph Neural Networks (GNNs) have been predominant for graph learning tasks; however, recent studies showed that a well-known graph algorithm, Label Propagation (LP), combined with a shallow neural network can achieve comparable performance to GNNs in semi-supervised node classification on graphs with high homophily. In this paper, we show that this approach falls short on graphs with low homophily, where nodes often connect to the nodes of the opposite classes. To overcome this, we carefully design a combination of a base predictor with LP algorithm that enjoys a closed-form solution as well as convergence guarantees. Our algorithm first learns the class compatibility matrix and then aggregates label predictions using LP algorithm weighted by class compatibilities. On a wide variety of benchmarks, we show that our approach achieves the leading performance on graphs with various levels of homophily. Meanwhile, it has orders of magnitude fewer parameters and requires less execution time. Empirical evaluations demonstrate that simple adaptations of LP can be competitive in semi-supervised node classification in both homophily and heterophily regimes.
SYJun 27, 2018
Towards the Existential Control of Boolean Networks: A Preliminary Report (Extended Abstract)Soumya Paul, Jun Pang, Cui Su
Given a Boolean network BN and a subset A of attractors of BN, we study the problem of identifying a minimal subset C of vertices of BN, such that the dynamics of BN can reach from a state s in any attractor As in A to any attractor At in A by controlling or toggling a subset of vertices in C in a single time step. We describe a method based on the decomposition of the network structure into strongly connected components called blocks. The control subset can be locally computed for each such block and the results then merged to derive the global control subset C. This potentially improves the efficiency for many real-life networks that are large but modular and well-structured. We are currently in the process of implementing our method in software.
CRJun 8, 2023
PriSampler: Mitigating Property Inference of Diffusion ModelsHailong Hu, Jun Pang
Diffusion models have been remarkably successful in data synthesis. However, when these models are applied to sensitive datasets, such as banking and human face data, they might bring up severe privacy concerns. This work systematically presents the first privacy study about property inference attacks against diffusion models, where adversaries aim to extract sensitive global properties of its training set from a diffusion model. Specifically, we focus on the most practical attack scenario: adversaries are restricted to accessing only synthetic data. Under this realistic scenario, we conduct a comprehensive evaluation of property inference attacks on various diffusion models trained on diverse data types, including tabular and image datasets. A broad range of evaluations reveals that diffusion models and their samplers are universally vulnerable to property inference attacks. In response, we propose a new model-agnostic plug-in method PriSampler to mitigate the risks of the property inference of diffusion models. PriSampler can be directly applied to well-trained diffusion models and support both stochastic and deterministic sampling. Extensive experiments illustrate the effectiveness of our defense, and it can lead adversaries to infer the proportion of properties as close as predefined values that model owners wish. Notably, PriSampler also shows its significantly superior performance to diffusion models trained with differential privacy on both model utility and defense performance. This work will elevate the awareness of preventing property inference attacks and encourage privacy-preserving synthetic data release.
CRJun 8, 2023
Ownership Protection of Generative Adversarial NetworksHailong Hu, Jun Pang
Generative adversarial networks (GANs) have shown remarkable success in image synthesis, making GAN models themselves commercially valuable to legitimate model owners. Therefore, it is critical to technically protect the intellectual property of GANs. Prior works need to tamper with the training set or training process, and they are not robust to emerging model extraction attacks. In this paper, we propose a new ownership protection method based on the common characteristics of a target model and its stolen models. Our method can be directly applicable to all well-trained GANs as it does not require retraining target models. Extensive experimental results show that our new method can achieve the best protection performance, compared to the state-of-the-art methods. Finally, we demonstrate the effectiveness of our method with respect to the number of generations of model extraction attacks, the number of generated samples, different datasets, as well as adaptive attacks.
SIJun 27, 2022
"Double vaccinated, 5G boosted!": Learning Attitudes towards COVID-19 Vaccination from Social MediaNinghan Chen, Xihui Chen, Zhiqiang Zhong et al.
To address the vaccine hesitancy which impairs the efforts of the COVID-19 vaccination campaign, it is imperative to understand public vaccination attitudes and timely grasp their changes. In spite of reliability and trustworthiness, conventional attitude collection based on surveys is time-consuming and expensive, and cannot follow the fast evolution of vaccination attitudes. We leverage the textual posts on social media to extract and track users' vaccination stances in near real time by proposing a deep learning framework. To address the impact of linguistic features such as sarcasm and irony commonly used in vaccine-related discourses, we integrate into the framework the recent posts of a user's social network neighbours to help detect the user's genuine attitude. Based on our annotated dataset from Twitter, the models instantiated from our framework can increase the performance of attitude extraction by up to 23% compared to state-of-the-art text-only models. Using this framework, we successfully validate the feasibility of using social media to track the evolution of vaccination attitudes in real life. We further show one practical use of our framework by validating the possibility to forecast a user's vaccine hesitancy changes with information perceived from social media.
LGJun 26, 2024Code
KAGNNs: Kolmogorov-Arnold Networks meet Graph LearningRoman Bresson, Giannis Nikolentzos, George Panagopoulos et al.
In recent years, Graph Neural Networks (GNNs) have become the de facto tool for learning node and graph representations. Most GNNs typically consist of a sequence of neighborhood aggregation (a.k.a., message-passing) layers, within which the representation of each node is updated based on those of its neighbors. The most expressive message-passing GNNs can be obtained through the use of the sum aggregator and of MLPs for feature transformation, thanks to their universal approximation capabilities. However, the limitations of MLPs recently motivated the introduction of another family of universal approximators, called Kolmogorov-Arnold Networks (KANs) which rely on a different representation theorem. In this work, we compare the performance of KANs against that of MLPs on graph learning tasks. We implement three new KAN-based GNN layers, inspired respectively by the GCN, GAT and GIN layers. We evaluate two different implementations of KANs using two distinct base families of functions, namely B-splines and radial basis functions. We perform extensive experiments on node classification, link prediction, graph classification and graph regression datasets. Our results indicate that KANs are on-par with or better than MLPs on all tasks studied in this paper. We also show that the size and training speed of RBF-based KANs is only marginally higher than for MLPs, making them viable alternatives. Code available at https://github.com/RomanBresson/KAGNN.
AIOct 4, 2023
On Quantified Observability Analysis in Multiagent SystemsChunyan Mu, Jun Pang
In multiagent systems (MASs), agents' observation upon system behaviours may improve the overall team performance, but may also leak sensitive information to an observer. A quantified observability analysis can thus be useful to assist decision-making in MASs by operators seeking to optimise the relationship between performance effectiveness and information exposure through observations in practice. This paper presents a novel approach to quantitatively analysing the observability properties in MASs. The concept of opacity is applied to formally express the characterisation of observability in MASs modelled as partially observable multiagent systems. We propose a temporal logic oPATL to reason about agents' observability with quantitative goals, which capture the probability of information transparency of system behaviours to an observer, and develop verification techniques for quantitatively analysing such properties. We implement the approach as an extension of the PRISM model checker, and illustrate its applicability via several examples.
LGMar 10, 2025
From Centralized to Decentralized Federated Learning: Theoretical Insights, Privacy Preservation, and Robustness ChallengesQiongxiu Li, Wenrui Yu, Yufei Xia et al.
Federated Learning (FL) enables collaborative learning without directly sharing individual's raw data. FL can be implemented in either a centralized (server-based) or decentralized (peer-to-peer) manner. In this survey, we present a novel perspective: the fundamental difference between centralized FL (CFL) and decentralized FL (DFL) is not merely the network topology, but the underlying training protocol: separate aggregation vs. joint optimization. We argue that this distinction in protocol leads to significant differences in model utility, privacy preservation, and robustness to attacks. We systematically review and categorize existing works in both CFL and DFL according to the type of protocol they employ. This taxonomy provides deeper insights into prior research and clarifies how various approaches relate or differ. Through our analysis, we identify key gaps in the literature. In particular, we observe a surprising lack of exploration of DFL approaches based on distributed optimization methods, despite their potential advantages. We highlight this under-explored direction and call for more research on leveraging distributed optimization for federated learning. Overall, this work offers a comprehensive overview from centralized to decentralized FL, sheds new light on the core distinctions between approaches, and outlines open challenges and future directions for the field.
LGMar 28, 2024
Uplift Modeling Under Limited SupervisionGeorge Panagopoulos, Daniele Malitesta, Fragkiskos D. Malliaros et al.
Estimating causal effects in e-commerce tends to involve costly treatment assignments which can be impractical in large-scale settings. Leveraging machine learning to predict such treatment effects without actual intervention is a standard practice to diminish the risk. However, existing methods for treatment effect prediction tend to rely on training sets of substantial size, which are built from real experiments and are thus inherently risky to create. In this work we propose a graph neural network to diminish the required training set size, relying on graphs that are common in e-commerce data. Specifically, we view the problem as node regression with a restricted number of labeled instances, develop a two-model neural architecture akin to previous causal effect estimators, and test varying message-passing layers for encoding. Furthermore, as an extra step, we combine the model with an acquisition function to guide the creation of the training set in settings with extremely low experimental budget. The framework is flexible since each step can be used separately with other models or treatment policies. The experiments on real large-scale networks indicate a clear advantage of our methodology over the state of the art, which in many cases performs close to random, underlining the need for models that can generalize with limited supervision to reduce experimental risks.
CLJan 26
Malicious Repurposing of Open Science Artefacts by Using Large Language ModelsZahra Hashemi, Zhiqiang Zhong, Jun Pang et al.
The rapid evolution of large language models (LLMs) has fuelled enthusiasm about their role in advancing scientific discovery, with studies exploring LLMs that autonomously generate and evaluate novel research ideas. However, little attention has been given to the possibility that such models could be exploited to produce harmful research by repurposing open science artefacts for malicious ends. We fill the gap by introducing an end-to-end pipeline that first bypasses LLM safeguards through persuasion-based jailbreaking, then reinterprets NLP papers to identify and repurpose their artefacts (datasets, methods, and tools) by exploiting their vulnerabilities, and finally assesses the safety of these proposals using our evaluation framework across three dimensions: harmfulness, feasibility of misuse, and soundness of technicality. Overall, our findings demonstrate that LLMs can generate harmful proposals by repurposing ethically designed open artefacts; however, we find that LLMs acting as evaluators strongly disagree with one another on evaluation outcomes: GPT-4.1 assigns higher scores (indicating greater potential harms, higher soundness and feasibility of misuse), Gemini-2.5-pro is markedly stricter, and Grok-3 falls between these extremes. This indicates that LLMs cannot yet serve as reliable judges in a malicious evaluation setup, making human evaluation essential for credible dual-use risk assessment.
LGOct 6, 2025
How does the optimizer implicitly bias the model merging loss landscape?Chenxiang Zhang, Alexander Theus, Damien Teney et al.
Model merging methods combine models with different capabilities into a single one while maintaining the same inference cost. Two popular approaches are linear interpolation, which linearly interpolates between model weights, and task arithmetic, which combines task vectors obtained by the difference between finetuned and base models. While useful in practice, what properties make merging effective are poorly understood. This paper explores how the optimization process affects the loss landscape geometry and its impact on merging success. We show that a single quantity -- the effective noise scale -- unifies the impact of optimizer and data choices on model merging. Across architectures and datasets, the effectiveness of merging success is a non-monotonic function of effective noise, with a distinct optimum. Decomposing this quantity, we find that larger learning rates, stronger weight decay, smaller batch sizes, and data augmentation all independently modulate the effective noise scale, exhibiting the same qualitative trend. Unlike prior work that connects optimizer noise to the flatness or generalization of individual minima, we show that it also affects the global loss landscape, predicting when independently trained solutions can be merged. Our findings broaden the understanding of how optimization shapes the loss landscape geometry and its downstream consequences for model merging, suggesting the possibility of further manipulating the training dynamics to improve merging effectiveness.
HCJul 26, 2025
TS-Insight: Visualizing Thompson Sampling for Verification and XAIParsa Vares, Éloi Durant, Jun Pang et al.
Thompson Sampling (TS) and its variants are powerful Multi-Armed Bandit algorithms used to balance exploration and exploitation strategies in active learning. Yet, their probabilistic nature often turns them into a "black box", hindering debugging and trust. We introduce TS-Insight, a visual analytics tool explicitly designed to shed light on the internal decision mechanisms of Thompson Sampling-based algorithms, for model developers. It comprises multiple plots, tracing for each arm the evolving posteriors, evidence counts, and sampling outcomes, enabling the verification, diagnosis, and explainability of exploration/exploitation dynamics. This tool aims at fostering trust and facilitating effective debugging and deployment in complex binary decision-making scenarios especially in sensitive domains requiring interpretable decision-making.
AIJun 30, 2025
Evaluating Multi-Agent Defences Against Jailbreaking Attacks on Large Language ModelsMaria Carolina Cornelia Wit, Jun Pang
Recent advances in large language models (LLMs) have raised concerns about jailbreaking attacks, i.e., prompts that bypass safety mechanisms. This paper investigates the use of multi-agent LLM systems as a defence against such attacks. We evaluate three jailbreaking strategies, including the original AutoDefense attack and two from Deepleaps: BetterDan and JB. Reproducing the AutoDefense framework, we compare single-agent setups with two- and three-agent configurations. Our results show that multi-agent systems enhance resistance to jailbreaks, especially by reducing false negatives. However, its effectiveness varies by attack type, and it introduces trade-offs such as increased false positives and computational overhead. These findings point to the limitations of current automated defences and suggest directions for improving alignment robustness in future LLM systems.
QMMar 18, 2025
Efficient Data Selection for Training Genomic Perturbation ModelsGeorge Panagopoulos, Johannes F. Lutzeyer, Sofiane Ennadir et al.
Genomic studies face a vast hypothesis space, while interventions such as gene perturbations remain costly and time-consuming. To accelerate such experiments, gene perturbation models predict the transcriptional outcome of interventions. Since constructing the training set is challenging, active learning is often employed in a "lab-in-the-loop" process. While this strategy makes training more targeted, it is substantially slower, as it fails to exploit the inherent parallelizability of Perturb-seq experiments. Here, we focus on graph neural network-based gene perturbation models and propose a subset selection method that, unlike active learning, selects the training perturbations in one shot. Our method chooses the interventions that maximize the propagation of the supervision signal to the model. The selection criterion is defined over the input knowledge graph and is optimized with submodular maximization, ensuring a near-optimal guarantee. Experimental results across multiple datasets show that, in addition to providing months of acceleration compared to active learning, the method improves the stability of perturbation choices while maintaining competitive predictive accuracy.
CRJan 6, 2021
Model Extraction and Defenses on Generative Adversarial NetworksHailong Hu, Jun Pang
Model extraction attacks aim to duplicate a machine learning model through query access to a target model. Early studies mainly focus on discriminative models. Despite the success, model extraction attacks against generative models are less well explored. In this paper, we systematically study the feasibility of model extraction attacks against generative adversarial networks (GANs). Specifically, we first define accuracy and fidelity on model extraction attacks against GANs. Then we study model extraction attacks against GANs from the perspective of accuracy extraction and fidelity extraction, according to the adversary's goals and background knowledge. We further conduct a case study where an adversary can transfer knowledge of the extracted model which steals a state-of-the-art GAN trained with more than 3 million images to new domains to broaden the scope of applications of model extraction attacks. Finally, we propose effective defense techniques to safeguard GANs, considering a trade-off between the utility and security of GAN models.
SINov 5, 2020
Semantic and Relational Spaces in Science of Science: Deep Learning Models for Article VectorisationDiego Kozlowski, Jennifer Dusdal, Jun Pang et al.
Over the last century, we observe a steady and exponentially growth of scientific publications globally. The overwhelming amount of available literature makes a holistic analysis of the research within a field and between fields based on manual inspection impossible. Automatic techniques to support the process of literature review are required to find the epistemic and social patterns that are embedded in scientific publications. In computer sciences, new tools have been developed to deal with large volumes of data. In particular, deep learning techniques open the possibility of automated end-to-end models to project observations to a new, low-dimensional space where the most relevant information of each observation is highlighted. Using deep learning to build new representations of scientific publications is a growing but still emerging field of research. The aim of this paper is to discuss the potential and limits of deep learning for gathering insights about scientific research articles. We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs). We explore the different outcomes generated by those techniques. Our results show that using NLP we can encode a semantic space of articles, while with GNN we are able to build a relational space where the social practices of a research community are also encoded.
LGOct 26, 2020
Personalised Meta-path Generation for Heterogeneous GNNsZhiqiang Zhong, Cheng-Te Li, Jun Pang
Recently, increasing attention has been paid to heterogeneous graph representation learning (HGRL), which aims to embed rich structural and semantic information in heterogeneous information networks (HINs) into low-dimensional node representations. To date, most HGRL models rely on hand-crafted meta-paths. However, the dependency on manually-defined meta-paths requires domain knowledge, which is difficult to obtain for complex HINs. More importantly, the pre-defined or generated meta-paths of all existing HGRL methods attached to each node type or node pair cannot be personalised to each individual node. To fully unleash the power of HGRL, we present a novel framework, Personalised Meta-path based Heterogeneous Graph Neural Networks (PM-HGNN), to jointly generate meta-paths that are personalised for each individual node in a HIN and learn node representations for the target downstream task like node classification. Precisely, PM-HGNN treats the meta-path generation as a Markov Decision Process and utilises a policy network to adaptively generate a meta-path for each individual node and simultaneously learn effective node representations. The policy network is trained with deep reinforcement learning by exploiting the performance improvement on a downstream task. We further propose an extension, PM-HGNN++, to better encode relational structure and accelerate the training during the meta-path generation. Experimental results reveal that both PM-HGNN and PM-HGNN++ can significantly and consistently outperform 16 competing baselines and state-of-the-art methods in various settings of node classification. Qualitative analysis also shows that PM-HGNN++ can identify meaningful meta-paths overlooked by human knowledge.
AIOct 1, 2020
Multi-grained Semantics-aware Graph Neural NetworksZhiqiang Zhong, Cheng-Te Li, Jun Pang
Graph Neural Networks (GNNs) are powerful techniques in representation learning for graphs and have been increasingly deployed in a multitude of different applications that involve node- and graph-wise tasks. Most existing studies solve either the node-wise task or the graph-wise task independently while they are inherently correlated. This work proposes a unified model, AdamGNN, to interactively learn node and graph representations in a mutual-optimisation manner. Compared with existing GNN models and graph pooling methods, AdamGNN enhances the node representation with the learned multi-grained semantics and avoids losing node features and graph structure information during pooling. Specifically, a differentiable pooling operator is proposed to adaptively generate a multi-grained structure that involves meso- and macro-level semantic information in the graph. We also devise the unpooling operator and the flyback aggregator in AdamGNN to better leverage the multi-grained semantics to enhance node representations. The updated node representations can further adjust the graph representation in the next iteration. Experiments on 14 real-world graph datasets show that AdamGNN can significantly outperform 17 competing models on both node- and graph-wise tasks. The ablation studies confirm the effectiveness of AdamGNN's components, and the last empirical analysis further reveals the ingenious ability of AdamGNN in capturing long-range interactions.
LGSep 8, 2020
Hierarchical Message-Passing Graph Neural NetworksZhiqiang Zhong, Cheng-Te Li, Jun Pang
Graph Neural Networks (GNNs) have become a prominent approach to machine learning with graphs and have been increasingly applied in a multitude of domains. Nevertheless, since most existing GNN models are based on flat message-passing mechanisms, two limitations need to be tackled: (i) they are costly in encoding long-range information spanning the graph structure; (ii) they are failing to encode features in the high-order neighbourhood in the graphs as they only perform information aggregation across the observed edges in the original graph. To deal with these two issues, we propose a novel Hierarchical Message-passing Graph Neural Networks framework. The key idea is generating a hierarchical structure that re-organises all nodes in a flat graph into multi-level super graphs, along with innovative intra- and inter-level propagation manners. The derived hierarchy creates shortcuts connecting far-away nodes so that informative long-range interactions can be efficiently accessed via message passing and incorporates meso- and macro-level semantics into the learned node representations. We present the first model to implement this framework, termed Hierarchical Community-aware Graph Neural Network (HC-GNN), with the assistance of a hierarchical community detection algorithm. The theoretical analysis illustrates HC-GNN's remarkable capacity in capturing long-range information without introducing heavy additional computation complexity. Empirical experiments conducted on 9 datasets under transductive, inductive, and few-shot settings exhibit that HC-GNN can outperform state-of-the-art GNN models in network analysis tasks, including node classification, link prediction, and community detection. Moreover, the model analysis further demonstrates HC-GNN's robustness facing graph sparsity and the flexibility in incorporating different GNN encoders.
SIAug 12, 2020
An Exploratory Study of COVID-19 Information on Twitter in the Greater RegionNinghan Chen, Zhiqiang Zhong, Jun Pang
The outbreak of the COVID-19 leads to a burst of information in major online social networks (OSNs). Facing this constantly changing situation, OSNs have become an essential platform for people expressing opinions and seeking up-to-the-minute information. Thus, discussions on OSNs may become a reflection of reality. This paper aims to figure out the distinctive characteristics of the Greater Region (GR) through conducting a data-driven exploratory study of Twitter COVID-19 information in the GR and related countries using machine learning and representation learning methods. We find that tweets volume and COVID-19 cases in GR and related countries are correlated, but this correlation only exists in a particular period of the pandemic. Moreover, we plot the changing of topics in each country and region from 2020-01-22 to 2020-06-05, figuring out the main differences between GR and related countries.
CRAug 25, 2018
Formal Analysis of an E-Health ProtocolNaipeng Dong, Hugo Jonker, Jun Pang
Given the sensitive nature of health data, security and privacy in e-health systems is of prime importance. It is crucial that an e-health system must ensure that users remain private - even if they are bribed or coerced to reveal themselves, or others: a pharmaceutical company could, for example, bribe a pharmacist to reveal information which breaks a doctor's privacy. In this paper, we first identify and formalise several new but important privacy properties on enforcing doctor privacy. Then we analyse the security and privacy of a complicated and practical e-health protocol (DLV08). Our analysis uncovers ambiguities in the protocol, and shows to what extent these new privacy properties as well as other security properties (such as secrecy and authentication) and privacy properties (such as anonymity and untraceability) are satisfied by the protocol. Finally, we address the found ambiguities which result in both security and privacy flaws, and propose suggestions for fixing them.
CRFeb 12, 2018
Tagvisor: A Privacy Advisor for Sharing HashtagsYang Zhang, Mathias Humbert, Tahleen Rahman et al.
Hashtag has emerged as a widely used concept of popular culture and campaigns, but its implications on people's privacy have not been investigated so far. In this paper, we present the first systematic analysis of privacy issues induced by hashtags. We concentrate in particular on location, which is recognized as one of the key privacy concerns in the Internet era. By relying on a random forest model, we show that we can infer a user's precise location from hashtags with accuracy of 70\% to 76\%, depending on the city. To remedy this situation, we introduce a system called Tagvisor that systematically suggests alternative hashtags if the user-selected ones constitute a threat to location privacy. Tagvisor realizes this by means of three conceptually different obfuscation techniques and a semantics-based metric for measuring the consequent utility loss. Our findings show that obfuscating as little as two hashtags already provides a near-optimal trade-off between privacy and utility in our dataset. This in particular renders Tagvisor highly time-efficient, and thus, practical in real-world settings.
CRAug 28, 2017
walk2friends: Inferring Social Links from Mobility ProfilesMichael Backes, Mathias Humbert, Jun Pang et al.
The development of positioning technologies has resulted in an increasing amount of mobility data being available. While bringing a lot of convenience to people's life, such availability also raises serious concerns about privacy. In this paper, we concentrate on one of the most sensitive information that can be inferred from mobility data, namely social relationships. We propose a novel social relation inference attack that relies on an advanced feature learning technique to automatically summarize users' mobility features. Compared to existing approaches, our attack is able to predict any two individuals' social relation, and it does not require the adversary to have any prior knowledge on existing social relations. These advantages significantly increase the applicability of our attack and the scope of the privacy assessment. Extensive experiments conducted on a large dataset demonstrate that our inference attack is effective, and achieves between 13% to 20% improvement over the best state-of-the-art scheme. We propose three defense mechanisms -- hiding, replacement and generalization -- and evaluate their effectiveness for mitigating the social link privacy risks stemming from mobility data sharing. Our experimental results show that both hiding and replacement mechanisms outperform generalization. Moreover, hiding and replacement achieve a comparable trade-off between utility and privacy, the former preserving better utility and the latter providing better privacy.
SEMay 26, 2016
Should We Learn Probabilistic Models for Model Checking? A New Approach and An Empirical StudyJingyi Wang, Jun Sun, Qixia Yuan et al.
Many automated system analysis techniques (e.g., model checking, model-based testing) rely on first obtaining a model of the system under analysis. System modeling is often done manually, which is often considered as a hindrance to adopt model-based system analysis and development techniques. To overcome this problem, researchers have proposed to automatically "learn" models based on sample system executions and shown that the learned models can be useful sometimes. There are however many questions to be answered. For instance, how much shall we generalize from the observed samples and how fast would learning converge? Or, would the analysis result based on the learned model be more accurate than the estimation we could have obtained by sampling many system executions within the same amount of time? In this work, we investigate existing algorithms for learning probabilistic models for model checking, propose an evolution-based approach for better controlling the degree of generalization and conduct an empirical study in order to answer the questions. One of our findings is that the effectiveness of learning may sometimes be limited.
CEApr 28, 2016
Fast Simulation of Probabilistic Boolean Networks (Technical Report)Andrzej Mizera, Jun Pang, Qixia Yuan
Probabilistic Boolean networks (PBNs) is an important mathematical framework widely used for modelling and analysing biological systems. PBNs are suited for modelling large biological systems, which more and more often arise in systems biology. However, the large system size poses a~significant challenge to the analysis of PBNs, in particular, to the crucial analysis of their steady-state behaviour. Numerical methods for performing steady-state analyses suffer from the state-space explosion problem, which makes the utilisation of statistical methods the only viable approach. However, such methods require long simulations of PBNs, rendering the simulation speed a crucial efficiency factor. For large PBNs and high estimation precision requirements, a slow simulation speed becomes an obstacle. In this paper, we propose a structure-based method for fast simulation of PBNs. This method first performs a network reduction operation and then divides nodes into groups for parallel simulation. Experimental results show that our method can lead to an approximately 10 times speedup for computing steady-state probabilities of a real-life biological network.
SEJun 10, 2015
Proceedings 4th International Workshop on Engineering Safety and Security SystemsJun Pang, Yang Liu, Sjouke Mauw
The present volume contains the proceedings of the Fourth International Workshop on Engineering Safety and Security Systems (ESSS'15). The workshop was held in Oslo, Norway, on June 22nd, 2015, as a satellite event of the 20th International Symposium on Formal Methods (FM'15).
SEMay 3, 2014
Proceedings Third International Workshop on Engineering Safety and Security SystemsJun Pang, Yang Liu
The International Workshop on Engineering Safety and Security Systems (ESSS) aims at contributing to the challenge of constructing reliable and secure systems. The workshop covers areas such as formal specification, type checking, model checking, program analysis/transformation, model-based testing and model-driven software construction. The workshop will bring together researchers and industry R&D expertise together to exchange their knowledge, discuss their research findings, and explore potential collaborations. The main theme of the workshop is methods and techniques for constructing large reliable and secure systems. The goal of the workshop is to establish a platform for the exchange of ideas, discussion, cross-fertilization, inspiration, co-operation, and dissemination.
CRMar 10, 2014
Stateful Security Protocol VerificationLi Li, Jun Pang, Yang Liu et al.
A long-standing research problem in security protocol design is how to efficiently verify security protocols with tamper-resistant global states. In this paper, we address this problem by first proposing a protocol specification framework, which explicitly represents protocol execution states and state transformations. Secondly, we develop an algorithm for verifying security properties by utilizing the key ingredients of the first-order reasoning for reachability analysis, while tracking state transformation and checking the validity of newly generated states. Our verification algorithm is proven to be (partially) correct, if it terminates. We have implemented the proposed framework and verification algorithms in a tool named SSPA, and evaluate it using a number of stateful security protocols. The experimental results show that our approach is not only feasible but also practically efficient. In particular, we have found a security flaw on the digital envelope protocol, which could not be detected by existing security protocol verifiers.
CRApr 9, 2013
A New Access Control Scheme for Facebook-style Social NetworksJun Pang, Yang Zhang
The popularity of online social networks (OSNs) makes the protection of users' private information an important but scientifically challenging problem. In the literature, relationship-based access control schemes have been proposed to address this problem. However, with the dynamic developments of OSNs, we identify new access control requirements which cannot be fully captured by the current schemes. In this paper, we focus on public information in OSNs and treat it as a new dimension which users can use to regulate access to their resources. We define a new OSN model containing users and their relationships as well as public information. Based on this model, we introduce a variant of hybrid logic for formulating access control policies. We exploit a type of category information and relationship hierarchy to further extend our logic for its usage in practice. In the end, we propose a few solutions to address the problem of information reliability in OSNs, and formally model collaborative access control in our access control scheme.