CLJun 25, 2025

Bridging Compositional and Distributional Semantics: A Survey on Latent Semantic Geometry via AutoEncoder

arXiv:2506.20083v3h-index: 7
Originality Synthesis-oriented
AI Analysis

It addresses the gap between symbolic and distributional semantics for enhancing interpretability and generalization in AI language models, but is incremental as a survey.

This survey tackles the problem of integrating compositional and symbolic properties into distributional semantic spaces to improve Transformer-based language models, by reviewing autoencoder architectures and their latent geometries.

Integrating compositional and symbolic properties into current distributional semantic spaces can enhance the interpretability, controllability, compositionality, and generalisation capabilities of Transformer-based auto-regressive language models (LMs). In this survey, we offer a novel perspective on latent space geometry through the lens of compositional semantics, a direction we refer to as \textit{semantic representation learning}. This direction enables a bridge between symbolic and distributional semantics, helping to mitigate the gap between them. We review and compare three mainstream autoencoder architectures-Variational AutoEncoder (VAE), Vector Quantised VAE (VQVAE), and Sparse AutoEncoder (SAE)-and examine the distinctive latent geometries they induce in relation to semantic structure and interpretability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes