Hegang Chen

AI
h-index7
3papers
25citations
Novelty60%
AI Score40

3 Papers

CLAug 27, 2024
Self-supervised Topic Taxonomy Discovery in the Box Embedding Space

Yuyin Lu, Hegang Chen, Pengbo Mao et al.

Topic taxonomy discovery aims at uncovering topics of different abstraction levels and constructing hierarchical relations between them. Unfortunately, most of prior work can hardly model semantic scopes of words and topics by holding the Euclidean embedding space assumption. What's worse, they infer asymmetric hierarchical relations by symmetric distances between topic embeddings. As a result, existing methods suffer from problems of low-quality topics at high abstraction levels and inaccurate hierarchical relations. To alleviate these problems, this paper develops a Box embedding-based Topic Model (BoxTM) that maps words and topics into the box embedding space, where the asymmetric metric is defined to properly infer hierarchical relations among topics. Additionally, our BoxTM explicitly infers upper-level topics based on correlation between specific topics through recursive clustering on topic boxes. Finally, extensive experiments validate high-quality of the topic taxonomy learned by BoxTM.

AISep 13, 2025
Exploring the Paradigm Shift from Grounding to Skolemization for Complex Query Answering on Knowledge Graphs

Yuyin Lu, Hegang Chen, Shanrui Xie et al.

Complex Query Answering (CQA) over incomplete Knowledge Graphs (KGs), typically formalized as reasoning with Existential First-Order predicate logic with one free variable (EFO\textsubscript{1}), faces a fundamental tradeoff between logic fidelity and computational efficiency. This work establishes a Grounding-Skolemization dichotomy to systematically analyze this challenge and motivate a paradigm shift in CQA. While Grounding-based methods inherently suffer from combinatorial explosion, most Skolemization-based methods neglect to explicitly model Skolem functions and compromise logical consistency. To address these limitations, we propose the Logic-constrained Vector Symbolic Architecture (LVSA), a neuro-symbolic framework that unifies a differentiable Skolemization module and a neural negator, as well as a logical constraint-driven optimization protocol to harmonize geometric and logical requirements. Theoretically, LVSA guarantees universality for all EFO\textsubscript{1} queries with low computational complexity. Empirically, it outperforms state-of-the-art Skolemization-based methods and reduces inference costs by orders of magnitude compared to Grounding-based baselines.

LGJul 11, 2025
scE$^2$TM: Toward Interpretable Single-Cell Embedding via Topic Modeling

Hegang Chen, Yuyin Lu, Zhiming Dai et al.

Recent advances in sequencing technologies have enabled researchers to explore cellular heterogeneity at single-cell resolution. Meanwhile, interpretability has gained prominence parallel to the rapid increase in the complexity and performance of deep learning models. In recent years, topic models have been widely used for interpretable single-cell embedding learning and clustering analysis, which we refer to as single-cell embedded topic models. However, previous studies evaluated the interpretability of the models mainly through qualitative analysis, and these single-cell embedded topic models suffer from the potential problem of interpretation collapse. Furthermore, their neglect of external biological knowledge constrains analytical performance. Here, we present scE2TM, an external knowledge-guided single-cell embedded topic model that provides a high-quality cell embedding and strong interpretation, contributing to comprehensive scRNA-seq data analysis. Our comprehensive evaluation across 20 scRNA-seq datasets demonstrates that scE2TM achieves significant clustering performance gains compared to 7 state-of-the-art methods. In addition, we propose a new interpretability evaluation benchmark that introduces 10 metrics to quantitatively assess the interpretability of single-cell embedded topic models. The results show that the interpretation provided by scE2TM performs encouragingly in terms of diversity and consistency with the underlying biological signals, contributing to a better revealing of the underlying biological mechanisms.