SIFeb 25, 2024Code
Towards Fair Graph Anomaly Detection: Problem, Benchmark Datasets, and EvaluationNeng Kai Nigel Neo, Yeon-Chang Lee, Yiqiao Jin et al.
The Fair Graph Anomaly Detection (FairGAD) problem aims to accurately detect anomalous nodes in an input graph while avoiding biased predictions against individuals from sensitive subgroups. However, the current literature does not comprehensively discuss this problem, nor does it provide realistic datasets that encompass actual graph structures, anomaly labels, and sensitive attributes. To bridge this gap, we introduce a formal definition of the FairGAD problem and present two novel datasets constructed from the social media platforms Reddit and Twitter. These datasets comprise 1.2 million and 400,000 edges associated with 9,000 and 47,000 nodes, respectively, and leverage political leanings as sensitive attributes and misinformation spreaders as anomaly labels. We demonstrate that our FairGAD datasets significantly differ from the synthetic datasets used by the research community. Using our datasets, we investigate the performance-fairness trade-off in nine existing GAD and non-graph AD methods on five state-of-the-art fairness methods. Our code and datasets are available at https://github.com/nigelnnk/FairGAD
LGAug 6, 2025
One Small Step with Fingerprints, One Giant Leap for De Novo Molecule Generation from Mass SpectraNeng Kai Nigel Neo, Lim Jing, Ngoui Yong Zhau Preston et al.
A common approach to the de novo molecular generation problem from mass spectra involves a two-stage pipeline: (1) encoding mass spectra into molecular fingerprints, followed by (2) decoding these fingerprints into molecular structures. In our work, we adopt MIST (Goldman et. al., 2023) as the encoder and MolForge (Ucak et. al., 2023) as the decoder, leveraging additional training data to enhance performance. We also threshold the probabilities of each fingerprint bit to focus on the presence of substructures. This results in a tenfold improvement over previous state-of-the-art methods, generating top-1 31% / top-10 40% of molecular structures correctly from mass spectra in MassSpecGym (Bushuiev et. al., 2024). We position this as a strong baseline for future research in de novo molecule elucidation from mass spectra.
CHEM-PHNov 17, 2018
Chemical Structure Elucidation from Mass Spectrometry by Matching SubstructuresJing Lim, Joshua Wong, Minn Xuan Wong et al.
Chemical structure elucidation is a serious bottleneck in analytical chemistry today. We address the problem of identifying an unknown chemical threat given its mass spectrum and its chemical formula, a task which might take well trained chemists several days to complete. Given a chemical formula, there could be over a million possible candidate structures. We take a data driven approach to rank these structures by using neural networks to predict the presence of substructures given the mass spectrum, and matching these substructures to the candidate structures. Empirically, we evaluate our approach on a data set of chemical agents built for unknown chemical threat identification. We show that our substructure classifiers can attain over 90% micro F1-score, and we can find the correct structure among the top 20 candidates in 88% and 71% of test cases for two compound classes.