69.7LGMay 7
Unlocking High-Fidelity Molecular Generation from Mass Spectra via Dual-Stream Line Graph DiffusionXujun Che, Xiuxia Du, Depeng Xu
De novo molecular generation from tandem mass spectra is a challenging inverse problem whose core difficulty lies in the circular dependency between atom-level and bond-level reasoning: determining a bond's type requires knowing its endpoint atoms' chemical environment, yet an atom's environment is in turn defined by its incident bonds. Existing graph diffusion methods process atoms and bonds within a single computation stream, where atom-bond information synchronization can only occur implicitly across layers. We argue that this single-stream paradigm, rather than the choice of any particular aggregation kernel, is a key architectural bottleneck. We propose DualLGD (Dual-stream Line Graph Diffusion), which reformulates molecular graph denoising as the alternating solution of two coupled subproblems: atom-level reasoning and bond-level reasoning, each operating in its own dedicated representation space. The line graph provides a natural mathematical construction for the bond space, in which bond angles, dihedrals, conjugation chains, and rings correspond to local topological motifs between bonds. Incidence-constrained bidirectional cross-attention synchronizes the two streams at every layer, ensuring that each atom attends only to its incident bonds and vice versa, respecting the fundamental chemical principle that an atom's environment is determined by its bonding context. On the NPLIB1 and MassSpecGym benchmarks, DualLGD achieves top-1 accuracy of 34.37\% and 23.89\%, approximately $3\times$ the previous state of the art. Ablation studies confirm the architecture as the primary source of improvement: DualLGD without any pre-training already surpasses the previous best fully pretrained model.
QMJan 2
Comparative Analysis of Formula and Structure Prediction from Tandem Mass SpectraXujun Che, Xiuxia Du, Depeng Xu
Liquid chromatography mass spectrometry (LC-MS)-based metabolomics and exposomics aim to measure detectable small molecules in biological samples. The results facilitate hypothesis-generating discovery of metabolic changes and disease mechanisms and provide information about environmental exposures and their effects on human health. Metabolomics and exposomics are made possible by the high resolving power of LC and high mass measurement accuracy of MS. However, a majority of the signals from such studies still cannot be identified or annotated using conventional library searching because existing spectral libraries are far from covering the vast chemical space captured by LC-MS/MS. To address this challenge and unleash the full potential of metabolomics and exposomics, a number of computational approaches have been developed to predict compounds based on tandem mass spectra. Published assessment of these approaches used different datasets and evaluation. To select prediction workflows for practical applications and identify areas for further improvements, we have carried out a systematic evaluation of the state-of-the-art prediction algorithms. Specifically, the accuracy of formula prediction and structure prediction was evaluated for different types of adducts. The resulting findings have established realistic performance baselines, identified critical bottlenecks, and provided guidance to further improve compound predictions based on MS.
CVSep 16, 2025
BiasMap: Leveraging Cross-Attentions to Discover and Mitigate Hidden Social Biases in Text-to-Image GenerationRajatsubhra Chakraborty, Xujun Che, Depeng Xu et al.
Bias discovery is critical for black-box generative models, especiall text-to-image (TTI) models. Existing works predominantly focus on output-level demographic distributions, which do not necessarily guarantee concept representations to be disentangled post-mitigation. We propose BiasMap, a model-agnostic framework for uncovering latent concept-level representational biases in stable diffusion models. BiasMap leverages cross-attention attribution maps to reveal structural entanglements between demographics (e.g., gender, race) and semantics (e.g., professions), going deeper into representational bias during the image generation. Using attribution maps of these concepts, we quantify the spatial demographics-semantics concept entanglement via Intersection over Union (IoU), offering a lens into bias that remains hidden in existing fairness discovery approaches. In addition, we further utilize BiasMap for bias mitigation through energy-guided diffusion sampling that directly modifies latent noise space and minimizes the expected SoftIoU during the denoising process. Our findings show that existing fairness interventions may reduce the output distributional gap but often fail to disentangle concept-level coupling, whereas our mitigation method can mitigate concept entanglement in image generation while complementing distributional bias mitigation.